{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# Поиск наиболее подходящих товаров (draft recommendations).\n",
    "\n",
    "Муравьёв Александр, Telegram: t.me/samohod4ik"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Описание проекта\n",
    "\n",
    "В рамках проекта вы поработаете с реальными сырыми данными от одного из крупнейших маркетплейсов страны.\n",
    "Вас ждет интересная задача сопоставления и поиска наиболее похожих товаров.\n",
    "Сопоставление или “мэтчинг” (англ. matching - соответствия) - одна из базовых задач машинного обучения, которая встречается в информационном поиске, компьютерном зрении, рекомендательных системах и др.\n",
    "\n",
    "Вы познакомитесь с алгоритмами приближённого поиска ближайщих соседей, научитесь создавать индексы в векторных базах данных и обучать ранжирующие модели. Эти навыки востребованы во многих сферах и позволят вам решать еще более интересные и сложные задачи.\n",
    "А по завершению проекта вы сможете пополнить резюме ещё одним интересным кейсом в портфолио."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Задача\n",
    "\n",
    "- разработать алгоритм, который для всех товаров из validation.csv предложит несколько вариантов наиболее похожих товаров из base;\n",
    "- оценить качество алгоритма по метрике accuracy@5"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Используемые библиотеки\n",
    "\n",
    "- catboost\n",
    "- imblearn\n",
    "- matplotlib\n",
    "- numpy\n",
    "- pandas\n",
    "- sklearn\n",
    "- faiss-cpu"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Данные\n",
    "\n",
    "base.csv - анонимизированный набор товаров. Каждый товар представлен как уникальный id (0-base, 1-base, 2-base) и вектор признаков размерностью 72.\n",
    "\n",
    "train.csv - обучающий датасет. Каждая строчка - один товар, для которого известен уникальный id (0-query, 1-query, …) , вектор признаков И id товара из base.csv, который максимально похож на него (по мнению экспертов).\n",
    "\n",
    "validation.csv - датасет с товарами (уникальный id и вектор признаков), для которых надо найти наиболее близкие товары из base.csv.\n",
    "\n",
    "validation_answer.csv - правильные ответы к предыдущему файлу."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Установка, импорт необходимых билиотек и знакомство с данными."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Collecting faiss-cpu\n",
      "  Using cached faiss_cpu-1.7.4-cp311-cp311-win_amd64.whl (10.8 MB)\n",
      "Installing collected packages: faiss-cpu\n",
      "Successfully installed faiss-cpu-1.7.4\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING: You are using pip version 21.3.1; however, version 23.2.1 is available.\n",
      "You should consider upgrading via the 'C:\\Users\\alexm\\DataSpellEnvironment\\Scripts\\python.exe -m pip install --upgrade pip' command.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: catboost in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (1.2.1)\n",
      "Requirement already satisfied: scipy in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from catboost) (1.11.2)\n",
      "Requirement already satisfied: plotly in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from catboost) (5.16.1)\n",
      "Requirement already satisfied: graphviz in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from catboost) (0.20.1)\n",
      "Requirement already satisfied: matplotlib in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from catboost) (3.6.2)\n",
      "Requirement already satisfied: pandas>=0.24 in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from catboost) (1.5.2)\n",
      "Requirement already satisfied: numpy>=1.16.0 in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from catboost) (1.24.0)\n",
      "Requirement already satisfied: six in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from catboost) (1.16.0)\n",
      "Requirement already satisfied: python-dateutil>=2.8.1 in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from pandas>=0.24->catboost) (2.8.2)\n",
      "Requirement already satisfied: pytz>=2020.1 in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from pandas>=0.24->catboost) (2022.7)\n",
      "Requirement already satisfied: pyparsing>=2.2.1 in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from matplotlib->catboost) (3.0.9)\n",
      "Requirement already satisfied: pillow>=6.2.0 in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from matplotlib->catboost) (9.3.0)\n",
      "Requirement already satisfied: fonttools>=4.22.0 in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from matplotlib->catboost) (4.38.0)\n",
      "Requirement already satisfied: kiwisolver>=1.0.1 in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from matplotlib->catboost) (1.4.4)\n",
      "Requirement already satisfied: packaging>=20.0 in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from matplotlib->catboost) (22.0)\n",
      "Requirement already satisfied: cycler>=0.10 in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from matplotlib->catboost) (0.11.0)\n",
      "Requirement already satisfied: contourpy>=1.0.1 in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from matplotlib->catboost) (1.0.6)\n",
      "Requirement already satisfied: tenacity>=6.2.0 in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from plotly->catboost) (8.2.3)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING: You are using pip version 21.3.1; however, version 23.2.1 is available.\n",
      "You should consider upgrading via the 'C:\\Users\\alexm\\DataSpellEnvironment\\Scripts\\python.exe -m pip install --upgrade pip' command.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: imblearn in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (0.0)\n",
      "Requirement already satisfied: imbalanced-learn in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from imblearn) (0.11.0)\n",
      "Requirement already satisfied: numpy>=1.17.3 in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from imbalanced-learn->imblearn) (1.24.0)\n",
      "Requirement already satisfied: threadpoolctl>=2.0.0 in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from imbalanced-learn->imblearn) (3.2.0)\n",
      "Requirement already satisfied: joblib>=1.1.1 in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from imbalanced-learn->imblearn) (1.3.2)\n",
      "Requirement already satisfied: scipy>=1.5.0 in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from imbalanced-learn->imblearn) (1.11.2)\n",
      "Requirement already satisfied: scikit-learn>=1.0.2 in c:\\users\\alexm\\dataspellenvironment\\lib\\site-packages (from imbalanced-learn->imblearn) (1.3.0)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING: You are using pip version 21.3.1; however, version 23.2.1 is available.\n",
      "You should consider upgrading via the 'C:\\Users\\alexm\\DataSpellEnvironment\\Scripts\\python.exe -m pip install --upgrade pip' command.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Collecting sklearn\n",
      "  Using cached sklearn-0.0.post9.tar.gz (3.6 kB)\n",
      "  Preparing metadata (setup.py): started\n",
      "  Preparing metadata (setup.py): finished with status 'done'\n",
      "Building wheels for collected packages: sklearn\n",
      "  Building wheel for sklearn (setup.py): started\n",
      "  Building wheel for sklearn (setup.py): finished with status 'done'\n",
      "  Created wheel for sklearn: filename=sklearn-0.0.post9-py3-none-any.whl size=2361 sha256=2f98c0b67a424b78bed6c3e5b41f4b57e33264a86339d81e862057de1bf2609a\n",
      "  Stored in directory: c:\\users\\alexm\\appdata\\local\\pip\\cache\\wheels\\ef\\63\\d1\\f1671e1e93b7ef4d35df483f9b2485e6dd21941da9a92296fb\n",
      "Successfully built sklearn\n",
      "Installing collected packages: sklearn\n",
      "Successfully installed sklearn-0.0.post9\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING: You are using pip version 21.3.1; however, version 23.2.1 is available.\n",
      "You should consider upgrading via the 'C:\\Users\\alexm\\DataSpellEnvironment\\Scripts\\python.exe -m pip install --upgrade pip' command.\n"
     ]
    }
   ],
   "source": [
    "# Установка необходимых уникальных библиотек\n",
    "!pip install faiss-cpu\n",
    "!pip install catboost\n",
    "!pip install imblearn\n",
    "!pip install sklearn"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "outputs": [],
   "source": [
    "# Импорт необходимых элементов\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import faiss\n",
    "\n",
    "from catboost import CatBoostClassifier\n",
    "from imblearn.pipeline import make_pipeline\n",
    "from imblearn.over_sampling import SMOTE\n",
    "from sklearn.preprocessing import StandardScaler"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "outputs": [],
   "source": [
    "# Получаем данные\n",
    "df_base = pd.read_csv('datasets\\\\base.csv', index_col=0)\n",
    "df_train = pd.read_csv('datasets\\\\train.csv', index_col=0)\n",
    "df_validation = pd.read_csv('datasets\\\\validation.csv', index_col=0)\n",
    "df_validation_answer = pd.read_csv('datasets\\\\validation_answer.csv', index_col=0)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "outputs": [],
   "source": [
    "# Объявим функцию для исследования данных, так как у нас 4 датасета.\n",
    "def describe_dataframe(dataframe):\n",
    "    display(dataframe.head(10))\n",
    "    display(dataframe.info())\n",
    "    display(dataframe.describe(percentiles=[.5]).T)\n",
    "    print(f\"Количество дублированных строк: {dataframe.duplicated().sum()}\")"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "outputs": [
    {
     "data": {
      "text/plain": "                 0          1          2          3           4           5  \\\nId                                                                            \n0-base -115.083890  11.152912 -64.426760 -118.88089  216.482440 -104.698060   \n1-base  -34.562202  13.332763 -69.787610 -166.53348   57.680607  -86.098370   \n2-base  -54.233746   6.379371 -29.210136 -133.41383  150.895830  -99.435326   \n3-base  -87.520130   4.037884 -87.803030 -185.06763   76.369540  -58.985165   \n4-base  -72.743850   6.522049  43.671265 -140.60803    5.820023 -112.074080   \n5-base  -50.510876   6.740296 -81.952030 -142.06926  129.064470 -121.037380   \n6-base -132.349780  12.640369 -80.635895 -137.11795   89.345490  -94.853520   \n7-base  -80.561770   9.547482 -44.353603 -141.60060  133.501530  -72.643170   \n8-base -110.159720   5.319833   8.020306 -172.44500   79.661644 -100.075910   \n9-base  -70.979260   4.714583 -88.550476 -188.70183  137.075880 -115.666030   \n\n                 6          7           8           9  ...          62  \\\nId                                                     ...               \n0-base -469.070588  44.348083  120.915344  181.449700  ...  -42.808693   \n1-base  -85.076666 -35.637436  119.718636  195.234190  ... -117.767525   \n2-base   52.554795  62.381706  128.951450  164.381470  ...  -76.397800   \n3-base -383.182845 -33.611237  122.031910  136.233580  ...  -70.647940   \n4-base -397.711282  45.182500  122.167180  112.119064  ...  -57.199104   \n5-base -365.401703  79.924280  124.752650  102.136750  ... -103.298900   \n6-base -462.933977  91.356030  126.557274  147.394900  ...  -55.650047   \n7-base -484.413795  -6.601168  122.312370  103.568220  ...  -55.613580   \n8-base   -2.583556  28.758438  122.134224  174.900270  ...  -54.116820   \n9-base -209.702523  63.176735  114.353775  117.680900  ...  -34.780470   \n\n               63         64          65         66         67          68  \\\nId                                                                           \n0-base  38.800827 -151.76218  -74.389090  63.666340  -4.703861   92.933610   \n1-base  41.100000 -157.82940  -94.446806  68.202110  24.346846  179.937930   \n2-base  46.011803 -207.14442  127.325570  65.566180  66.325680   81.073490   \n3-base  -6.358921 -147.20105  -37.692750  66.202890 -20.566910  137.206940   \n4-base  56.642403 -159.35184   85.944724  66.766320  -2.505783   65.315285   \n5-base  28.675972 -208.37845  -78.293455  66.580765  70.894360   30.805370   \n6-base  29.008305 -138.24612  156.300510  67.054200 -25.324776   85.734146   \n7-base  55.202328 -179.15207   48.050861  66.357260  26.573547  115.890150   \n8-base  36.942806 -133.81061  -23.802479  69.412280 -51.071934   91.097830   \n9-base  55.108017 -168.94844  -36.150143  67.762600  49.744743  -16.176880   \n\n                69           70         71  \nId                                          \n0-base  115.269190  -112.756640 -60.830353  \n1-base  116.834000   -84.888941 -59.524610  \n2-base  116.594154 -1074.464888 -32.527206  \n3-base  117.474100 -1074.464888 -72.915490  \n4-base  135.051590 -1074.464888   0.319401  \n5-base  134.891560  -913.638206 -30.293541  \n6-base  138.853520 -1070.516278  -2.041809  \n7-base  110.674900 -1074.464888 -69.883660  \n8-base  108.008660  -426.686160 -12.405426  \n9-base  139.030990  -440.837882 -98.691440  \n\n[10 rows x 72 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>0</th>\n      <th>1</th>\n      <th>2</th>\n      <th>3</th>\n      <th>4</th>\n      <th>5</th>\n      <th>6</th>\n      <th>7</th>\n      <th>8</th>\n      <th>9</th>\n      <th>...</th>\n      <th>62</th>\n      <th>63</th>\n      <th>64</th>\n      <th>65</th>\n      <th>66</th>\n      <th>67</th>\n      <th>68</th>\n      <th>69</th>\n      <th>70</th>\n      <th>71</th>\n    </tr>\n    <tr>\n      <th>Id</th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0-base</th>\n      <td>-115.083890</td>\n      <td>11.152912</td>\n      <td>-64.426760</td>\n      <td>-118.88089</td>\n      <td>216.482440</td>\n      <td>-104.698060</td>\n      <td>-469.070588</td>\n      <td>44.348083</td>\n      <td>120.915344</td>\n      <td>181.449700</td>\n      <td>...</td>\n      <td>-42.808693</td>\n      <td>38.800827</td>\n      <td>-151.76218</td>\n      <td>-74.389090</td>\n      <td>63.666340</td>\n      <td>-4.703861</td>\n      <td>92.933610</td>\n      <td>115.269190</td>\n      <td>-112.756640</td>\n      <td>-60.830353</td>\n    </tr>\n    <tr>\n      <th>1-base</th>\n      <td>-34.562202</td>\n      <td>13.332763</td>\n      <td>-69.787610</td>\n      <td>-166.53348</td>\n      <td>57.680607</td>\n      <td>-86.098370</td>\n      <td>-85.076666</td>\n      <td>-35.637436</td>\n      <td>119.718636</td>\n      <td>195.234190</td>\n      <td>...</td>\n      <td>-117.767525</td>\n      <td>41.100000</td>\n      <td>-157.82940</td>\n      <td>-94.446806</td>\n      <td>68.202110</td>\n      <td>24.346846</td>\n      <td>179.937930</td>\n      <td>116.834000</td>\n      <td>-84.888941</td>\n      <td>-59.524610</td>\n    </tr>\n    <tr>\n      <th>2-base</th>\n      <td>-54.233746</td>\n      <td>6.379371</td>\n      <td>-29.210136</td>\n      <td>-133.41383</td>\n      <td>150.895830</td>\n      <td>-99.435326</td>\n      <td>52.554795</td>\n      <td>62.381706</td>\n      <td>128.951450</td>\n      <td>164.381470</td>\n      <td>...</td>\n      <td>-76.397800</td>\n      <td>46.011803</td>\n      <td>-207.14442</td>\n      <td>127.325570</td>\n      <td>65.566180</td>\n      <td>66.325680</td>\n      <td>81.073490</td>\n      <td>116.594154</td>\n      <td>-1074.464888</td>\n      <td>-32.527206</td>\n    </tr>\n    <tr>\n      <th>3-base</th>\n      <td>-87.520130</td>\n      <td>4.037884</td>\n      <td>-87.803030</td>\n      <td>-185.06763</td>\n      <td>76.369540</td>\n      <td>-58.985165</td>\n      <td>-383.182845</td>\n      <td>-33.611237</td>\n      <td>122.031910</td>\n      <td>136.233580</td>\n      <td>...</td>\n      <td>-70.647940</td>\n      <td>-6.358921</td>\n      <td>-147.20105</td>\n      <td>-37.692750</td>\n      <td>66.202890</td>\n      <td>-20.566910</td>\n      <td>137.206940</td>\n      <td>117.474100</td>\n      <td>-1074.464888</td>\n      <td>-72.915490</td>\n    </tr>\n    <tr>\n      <th>4-base</th>\n      <td>-72.743850</td>\n      <td>6.522049</td>\n      <td>43.671265</td>\n      <td>-140.60803</td>\n      <td>5.820023</td>\n      <td>-112.074080</td>\n      <td>-397.711282</td>\n      <td>45.182500</td>\n      <td>122.167180</td>\n      <td>112.119064</td>\n      <td>...</td>\n      <td>-57.199104</td>\n      <td>56.642403</td>\n      <td>-159.35184</td>\n      <td>85.944724</td>\n      <td>66.766320</td>\n      <td>-2.505783</td>\n      <td>65.315285</td>\n      <td>135.051590</td>\n      <td>-1074.464888</td>\n      <td>0.319401</td>\n    </tr>\n    <tr>\n      <th>5-base</th>\n      <td>-50.510876</td>\n      <td>6.740296</td>\n      <td>-81.952030</td>\n      <td>-142.06926</td>\n      <td>129.064470</td>\n      <td>-121.037380</td>\n      <td>-365.401703</td>\n      <td>79.924280</td>\n      <td>124.752650</td>\n      <td>102.136750</td>\n      <td>...</td>\n      <td>-103.298900</td>\n      <td>28.675972</td>\n      <td>-208.37845</td>\n      <td>-78.293455</td>\n      <td>66.580765</td>\n      <td>70.894360</td>\n      <td>30.805370</td>\n      <td>134.891560</td>\n      <td>-913.638206</td>\n      <td>-30.293541</td>\n    </tr>\n    <tr>\n      <th>6-base</th>\n      <td>-132.349780</td>\n      <td>12.640369</td>\n      <td>-80.635895</td>\n      <td>-137.11795</td>\n      <td>89.345490</td>\n      <td>-94.853520</td>\n      <td>-462.933977</td>\n      <td>91.356030</td>\n      <td>126.557274</td>\n      <td>147.394900</td>\n      <td>...</td>\n      <td>-55.650047</td>\n      <td>29.008305</td>\n      <td>-138.24612</td>\n      <td>156.300510</td>\n      <td>67.054200</td>\n      <td>-25.324776</td>\n      <td>85.734146</td>\n      <td>138.853520</td>\n      <td>-1070.516278</td>\n      <td>-2.041809</td>\n    </tr>\n    <tr>\n      <th>7-base</th>\n      <td>-80.561770</td>\n      <td>9.547482</td>\n      <td>-44.353603</td>\n      <td>-141.60060</td>\n      <td>133.501530</td>\n      <td>-72.643170</td>\n      <td>-484.413795</td>\n      <td>-6.601168</td>\n      <td>122.312370</td>\n      <td>103.568220</td>\n      <td>...</td>\n      <td>-55.613580</td>\n      <td>55.202328</td>\n      <td>-179.15207</td>\n      <td>48.050861</td>\n      <td>66.357260</td>\n      <td>26.573547</td>\n      <td>115.890150</td>\n      <td>110.674900</td>\n      <td>-1074.464888</td>\n      <td>-69.883660</td>\n    </tr>\n    <tr>\n      <th>8-base</th>\n      <td>-110.159720</td>\n      <td>5.319833</td>\n      <td>8.020306</td>\n      <td>-172.44500</td>\n      <td>79.661644</td>\n      <td>-100.075910</td>\n      <td>-2.583556</td>\n      <td>28.758438</td>\n      <td>122.134224</td>\n      <td>174.900270</td>\n      <td>...</td>\n      <td>-54.116820</td>\n      <td>36.942806</td>\n      <td>-133.81061</td>\n      <td>-23.802479</td>\n      <td>69.412280</td>\n      <td>-51.071934</td>\n      <td>91.097830</td>\n      <td>108.008660</td>\n      <td>-426.686160</td>\n      <td>-12.405426</td>\n    </tr>\n    <tr>\n      <th>9-base</th>\n      <td>-70.979260</td>\n      <td>4.714583</td>\n      <td>-88.550476</td>\n      <td>-188.70183</td>\n      <td>137.075880</td>\n      <td>-115.666030</td>\n      <td>-209.702523</td>\n      <td>63.176735</td>\n      <td>114.353775</td>\n      <td>117.680900</td>\n      <td>...</td>\n      <td>-34.780470</td>\n      <td>55.108017</td>\n      <td>-168.94844</td>\n      <td>-36.150143</td>\n      <td>67.762600</td>\n      <td>49.744743</td>\n      <td>-16.176880</td>\n      <td>139.030990</td>\n      <td>-440.837882</td>\n      <td>-98.691440</td>\n    </tr>\n  </tbody>\n</table>\n<p>10 rows × 72 columns</p>\n</div>"
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Index: 2918139 entries, 0-base to 4744766-base\n",
      "Data columns (total 72 columns):\n",
      " #   Column  Dtype  \n",
      "---  ------  -----  \n",
      " 0   0       float64\n",
      " 1   1       float64\n",
      " 2   2       float64\n",
      " 3   3       float64\n",
      " 4   4       float64\n",
      " 5   5       float64\n",
      " 6   6       float64\n",
      " 7   7       float64\n",
      " 8   8       float64\n",
      " 9   9       float64\n",
      " 10  10      float64\n",
      " 11  11      float64\n",
      " 12  12      float64\n",
      " 13  13      float64\n",
      " 14  14      float64\n",
      " 15  15      float64\n",
      " 16  16      float64\n",
      " 17  17      float64\n",
      " 18  18      float64\n",
      " 19  19      float64\n",
      " 20  20      float64\n",
      " 21  21      float64\n",
      " 22  22      float64\n",
      " 23  23      float64\n",
      " 24  24      float64\n",
      " 25  25      float64\n",
      " 26  26      float64\n",
      " 27  27      float64\n",
      " 28  28      float64\n",
      " 29  29      float64\n",
      " 30  30      float64\n",
      " 31  31      float64\n",
      " 32  32      float64\n",
      " 33  33      float64\n",
      " 34  34      float64\n",
      " 35  35      float64\n",
      " 36  36      float64\n",
      " 37  37      float64\n",
      " 38  38      float64\n",
      " 39  39      float64\n",
      " 40  40      float64\n",
      " 41  41      float64\n",
      " 42  42      float64\n",
      " 43  43      float64\n",
      " 44  44      float64\n",
      " 45  45      float64\n",
      " 46  46      float64\n",
      " 47  47      float64\n",
      " 48  48      float64\n",
      " 49  49      float64\n",
      " 50  50      float64\n",
      " 51  51      float64\n",
      " 52  52      float64\n",
      " 53  53      float64\n",
      " 54  54      float64\n",
      " 55  55      float64\n",
      " 56  56      float64\n",
      " 57  57      float64\n",
      " 58  58      float64\n",
      " 59  59      float64\n",
      " 60  60      float64\n",
      " 61  61      float64\n",
      " 62  62      float64\n",
      " 63  63      float64\n",
      " 64  64      float64\n",
      " 65  65      float64\n",
      " 66  66      float64\n",
      " 67  67      float64\n",
      " 68  68      float64\n",
      " 69  69      float64\n",
      " 70  70      float64\n",
      " 71  71      float64\n",
      "dtypes: float64(72)\n",
      "memory usage: 1.6+ GB\n"
     ]
    },
    {
     "data": {
      "text/plain": "None"
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": "        count        mean         std          min          50%         max\n0   2918139.0  -86.229474   24.891320  -199.468700   -86.231500   21.515549\n1   2918139.0    8.080077    4.953387   -13.914608     8.038950   29.937210\n2   2918139.0  -44.580804   38.631660  -240.073400   -43.816605  160.937230\n3   2918139.0 -146.634991   19.844805  -232.667140  -146.776810  -51.374780\n4   2918139.0  111.316628   46.348090  -105.582960   111.873000  319.664500\n..        ...         ...         ...          ...          ...         ...\n67  2918139.0   23.544896   55.342236  -233.138170    23.416494  314.898770\n68  2918139.0   74.959301   61.345005  -203.601620    74.929970  339.573850\n69  2918139.0  115.566716   21.175183    15.724480   116.024445  214.706340\n70  2918139.0 -799.339026  385.413088 -1297.931468 -1074.464888   98.770811\n71  2918139.0  -47.791251   41.748021  -226.780060   -48.591960  126.973220\n\n[72 rows x 6 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>count</th>\n      <th>mean</th>\n      <th>std</th>\n      <th>min</th>\n      <th>50%</th>\n      <th>max</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>2918139.0</td>\n      <td>-86.229474</td>\n      <td>24.891320</td>\n      <td>-199.468700</td>\n      <td>-86.231500</td>\n      <td>21.515549</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2918139.0</td>\n      <td>8.080077</td>\n      <td>4.953387</td>\n      <td>-13.914608</td>\n      <td>8.038950</td>\n      <td>29.937210</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>2918139.0</td>\n      <td>-44.580804</td>\n      <td>38.631660</td>\n      <td>-240.073400</td>\n      <td>-43.816605</td>\n      <td>160.937230</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>2918139.0</td>\n      <td>-146.634991</td>\n      <td>19.844805</td>\n      <td>-232.667140</td>\n      <td>-146.776810</td>\n      <td>-51.374780</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>2918139.0</td>\n      <td>111.316628</td>\n      <td>46.348090</td>\n      <td>-105.582960</td>\n      <td>111.873000</td>\n      <td>319.664500</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>67</th>\n      <td>2918139.0</td>\n      <td>23.544896</td>\n      <td>55.342236</td>\n      <td>-233.138170</td>\n      <td>23.416494</td>\n      <td>314.898770</td>\n    </tr>\n    <tr>\n      <th>68</th>\n      <td>2918139.0</td>\n      <td>74.959301</td>\n      <td>61.345005</td>\n      <td>-203.601620</td>\n      <td>74.929970</td>\n      <td>339.573850</td>\n    </tr>\n    <tr>\n      <th>69</th>\n      <td>2918139.0</td>\n      <td>115.566716</td>\n      <td>21.175183</td>\n      <td>15.724480</td>\n      <td>116.024445</td>\n      <td>214.706340</td>\n    </tr>\n    <tr>\n      <th>70</th>\n      <td>2918139.0</td>\n      <td>-799.339026</td>\n      <td>385.413088</td>\n      <td>-1297.931468</td>\n      <td>-1074.464888</td>\n      <td>98.770811</td>\n    </tr>\n    <tr>\n      <th>71</th>\n      <td>2918139.0</td>\n      <td>-47.791251</td>\n      <td>41.748021</td>\n      <td>-226.780060</td>\n      <td>-48.591960</td>\n      <td>126.973220</td>\n    </tr>\n  </tbody>\n</table>\n<p>72 rows × 6 columns</p>\n</div>"
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Количество дублированных строк: 0\n"
     ]
    }
   ],
   "source": [
    "# Рассмотрим базовый датасет с данными\n",
    "describe_dataframe(df_base)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "outputs": [
    {
     "data": {
      "text/plain": "                  0          1           2          3           4           5  \\\nId                                                                              \n0-query  -53.882748  17.971436  -42.117104 -183.93668  187.517490  -87.144930   \n1-query  -87.776370   6.806268  -32.054546 -177.26039  120.803330  -83.810590   \n2-query  -49.979565   3.841486 -116.118590 -180.40198  190.128430  -50.837620   \n3-query  -47.810562   9.086598 -115.401695 -121.01136   94.652840 -109.255410   \n4-query  -79.632126  14.442886  -58.903397 -147.05254   57.127068  -16.239529   \n5-query  -92.844185   2.975510  -61.760483 -171.67546  144.798370  -58.685143   \n6-query -127.996580   9.672705  -37.678320 -141.41304  119.940926  -76.850460   \n7-query  -59.506752   7.959120   21.068153 -142.99788  128.157990  -92.496300   \n8-query -111.359276   6.414694  -88.023330 -131.79814  157.491360  -83.075620   \n9-query  -75.485540   9.759681  -66.438310 -159.26770  110.319880  -65.563866   \n\n                  6          7           8           9  ...         63  \\\nId                                                      ...              \n0-query -347.360606  38.307602  109.085560   30.413513  ...  70.107360   \n1-query  -94.572749 -78.433090  124.915900  140.331070  ...   4.669178   \n2-query   26.943937 -30.447489  125.771164  211.607820  ...  78.039764   \n3-query -775.150134  79.186520  124.003100  242.650650  ...  44.515266   \n4-query -321.317964  45.984676  125.941284  103.392670  ...  45.028910   \n5-query  104.112909  75.844580  118.336230   81.981125  ...  12.807585   \n6-query  -15.455417  75.031220  128.762990  266.939580  ...  65.676480   \n7-query -143.453758  10.710561  131.086790   71.491560  ...  49.263610   \n8-query -759.626065  49.599327  120.683470   87.519910  ...  45.634857   \n9-query -325.035875 -77.376080  127.402720  173.257580  ...  17.841938   \n\n                64          65         66          67          68          69  \\\nId                                                                              \n0-query -155.80257 -101.965943  65.903790   34.457500   62.642094  134.763600   \n1-query -151.69771   -1.638704  68.170876   25.096191   89.974976  130.589630   \n2-query -169.14620   82.144186  66.008220   18.400496  212.409730  121.931470   \n3-query -145.41675   93.990981  64.131350  106.061920   83.178760  118.277725   \n4-query -196.09207 -117.626337  66.926220   42.456170   77.621765   92.479930   \n5-query -137.96362   86.282088  64.678535   64.527600   64.664440  126.914600   \n6-query -145.51813  139.211140  69.942120   21.280258   76.636410   85.143050   \n7-query -203.78833 -101.989379  67.291770   44.437595   45.183838  150.288530   \n8-query -136.84851  117.466574  68.664635  -19.192120   54.241932   92.495636   \n9-query -129.49008 -142.201450  68.498550  -58.113903  -13.233955  114.562810   \n\n                  70         71        Target  \nId                                             \n0-query  -415.750254 -25.958572   675816-base  \n1-query -1035.092211 -51.276833   366656-base  \n2-query -1074.464888 -22.547178  1447819-base  \n3-query -1074.464888 -19.902788  1472602-base  \n4-query -1074.464888 -21.149351   717819-base  \n5-query  -800.428664 -30.197390  2381316-base  \n6-query   -44.371931 -15.524637   773187-base  \n7-query -1074.464888 -66.052086  2488580-base  \n8-query    11.012047 -61.062813    24129-base  \n9-query -1074.464888  31.882130   775706-base  \n\n[10 rows x 73 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>0</th>\n      <th>1</th>\n      <th>2</th>\n      <th>3</th>\n      <th>4</th>\n      <th>5</th>\n      <th>6</th>\n      <th>7</th>\n      <th>8</th>\n      <th>9</th>\n      <th>...</th>\n      <th>63</th>\n      <th>64</th>\n      <th>65</th>\n      <th>66</th>\n      <th>67</th>\n      <th>68</th>\n      <th>69</th>\n      <th>70</th>\n      <th>71</th>\n      <th>Target</th>\n    </tr>\n    <tr>\n      <th>Id</th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0-query</th>\n      <td>-53.882748</td>\n      <td>17.971436</td>\n      <td>-42.117104</td>\n      <td>-183.93668</td>\n      <td>187.517490</td>\n      <td>-87.144930</td>\n      <td>-347.360606</td>\n      <td>38.307602</td>\n      <td>109.085560</td>\n      <td>30.413513</td>\n      <td>...</td>\n      <td>70.107360</td>\n      <td>-155.80257</td>\n      <td>-101.965943</td>\n      <td>65.903790</td>\n      <td>34.457500</td>\n      <td>62.642094</td>\n      <td>134.763600</td>\n      <td>-415.750254</td>\n      <td>-25.958572</td>\n      <td>675816-base</td>\n    </tr>\n    <tr>\n      <th>1-query</th>\n      <td>-87.776370</td>\n      <td>6.806268</td>\n      <td>-32.054546</td>\n      <td>-177.26039</td>\n      <td>120.803330</td>\n      <td>-83.810590</td>\n      <td>-94.572749</td>\n      <td>-78.433090</td>\n      <td>124.915900</td>\n      <td>140.331070</td>\n      <td>...</td>\n      <td>4.669178</td>\n      <td>-151.69771</td>\n      <td>-1.638704</td>\n      <td>68.170876</td>\n      <td>25.096191</td>\n      <td>89.974976</td>\n      <td>130.589630</td>\n      <td>-1035.092211</td>\n      <td>-51.276833</td>\n      <td>366656-base</td>\n    </tr>\n    <tr>\n      <th>2-query</th>\n      <td>-49.979565</td>\n      <td>3.841486</td>\n      <td>-116.118590</td>\n      <td>-180.40198</td>\n      <td>190.128430</td>\n      <td>-50.837620</td>\n      <td>26.943937</td>\n      <td>-30.447489</td>\n      <td>125.771164</td>\n      <td>211.607820</td>\n      <td>...</td>\n      <td>78.039764</td>\n      <td>-169.14620</td>\n      <td>82.144186</td>\n      <td>66.008220</td>\n      <td>18.400496</td>\n      <td>212.409730</td>\n      <td>121.931470</td>\n      <td>-1074.464888</td>\n      <td>-22.547178</td>\n      <td>1447819-base</td>\n    </tr>\n    <tr>\n      <th>3-query</th>\n      <td>-47.810562</td>\n      <td>9.086598</td>\n      <td>-115.401695</td>\n      <td>-121.01136</td>\n      <td>94.652840</td>\n      <td>-109.255410</td>\n      <td>-775.150134</td>\n      <td>79.186520</td>\n      <td>124.003100</td>\n      <td>242.650650</td>\n      <td>...</td>\n      <td>44.515266</td>\n      <td>-145.41675</td>\n      <td>93.990981</td>\n      <td>64.131350</td>\n      <td>106.061920</td>\n      <td>83.178760</td>\n      <td>118.277725</td>\n      <td>-1074.464888</td>\n      <td>-19.902788</td>\n      <td>1472602-base</td>\n    </tr>\n    <tr>\n      <th>4-query</th>\n      <td>-79.632126</td>\n      <td>14.442886</td>\n      <td>-58.903397</td>\n      <td>-147.05254</td>\n      <td>57.127068</td>\n      <td>-16.239529</td>\n      <td>-321.317964</td>\n      <td>45.984676</td>\n      <td>125.941284</td>\n      <td>103.392670</td>\n      <td>...</td>\n      <td>45.028910</td>\n      <td>-196.09207</td>\n      <td>-117.626337</td>\n      <td>66.926220</td>\n      <td>42.456170</td>\n      <td>77.621765</td>\n      <td>92.479930</td>\n      <td>-1074.464888</td>\n      <td>-21.149351</td>\n      <td>717819-base</td>\n    </tr>\n    <tr>\n      <th>5-query</th>\n      <td>-92.844185</td>\n      <td>2.975510</td>\n      <td>-61.760483</td>\n      <td>-171.67546</td>\n      <td>144.798370</td>\n      <td>-58.685143</td>\n      <td>104.112909</td>\n      <td>75.844580</td>\n      <td>118.336230</td>\n      <td>81.981125</td>\n      <td>...</td>\n      <td>12.807585</td>\n      <td>-137.96362</td>\n      <td>86.282088</td>\n      <td>64.678535</td>\n      <td>64.527600</td>\n      <td>64.664440</td>\n      <td>126.914600</td>\n      <td>-800.428664</td>\n      <td>-30.197390</td>\n      <td>2381316-base</td>\n    </tr>\n    <tr>\n      <th>6-query</th>\n      <td>-127.996580</td>\n      <td>9.672705</td>\n      <td>-37.678320</td>\n      <td>-141.41304</td>\n      <td>119.940926</td>\n      <td>-76.850460</td>\n      <td>-15.455417</td>\n      <td>75.031220</td>\n      <td>128.762990</td>\n      <td>266.939580</td>\n      <td>...</td>\n      <td>65.676480</td>\n      <td>-145.51813</td>\n      <td>139.211140</td>\n      <td>69.942120</td>\n      <td>21.280258</td>\n      <td>76.636410</td>\n      <td>85.143050</td>\n      <td>-44.371931</td>\n      <td>-15.524637</td>\n      <td>773187-base</td>\n    </tr>\n    <tr>\n      <th>7-query</th>\n      <td>-59.506752</td>\n      <td>7.959120</td>\n      <td>21.068153</td>\n      <td>-142.99788</td>\n      <td>128.157990</td>\n      <td>-92.496300</td>\n      <td>-143.453758</td>\n      <td>10.710561</td>\n      <td>131.086790</td>\n      <td>71.491560</td>\n      <td>...</td>\n      <td>49.263610</td>\n      <td>-203.78833</td>\n      <td>-101.989379</td>\n      <td>67.291770</td>\n      <td>44.437595</td>\n      <td>45.183838</td>\n      <td>150.288530</td>\n      <td>-1074.464888</td>\n      <td>-66.052086</td>\n      <td>2488580-base</td>\n    </tr>\n    <tr>\n      <th>8-query</th>\n      <td>-111.359276</td>\n      <td>6.414694</td>\n      <td>-88.023330</td>\n      <td>-131.79814</td>\n      <td>157.491360</td>\n      <td>-83.075620</td>\n      <td>-759.626065</td>\n      <td>49.599327</td>\n      <td>120.683470</td>\n      <td>87.519910</td>\n      <td>...</td>\n      <td>45.634857</td>\n      <td>-136.84851</td>\n      <td>117.466574</td>\n      <td>68.664635</td>\n      <td>-19.192120</td>\n      <td>54.241932</td>\n      <td>92.495636</td>\n      <td>11.012047</td>\n      <td>-61.062813</td>\n      <td>24129-base</td>\n    </tr>\n    <tr>\n      <th>9-query</th>\n      <td>-75.485540</td>\n      <td>9.759681</td>\n      <td>-66.438310</td>\n      <td>-159.26770</td>\n      <td>110.319880</td>\n      <td>-65.563866</td>\n      <td>-325.035875</td>\n      <td>-77.376080</td>\n      <td>127.402720</td>\n      <td>173.257580</td>\n      <td>...</td>\n      <td>17.841938</td>\n      <td>-129.49008</td>\n      <td>-142.201450</td>\n      <td>68.498550</td>\n      <td>-58.113903</td>\n      <td>-13.233955</td>\n      <td>114.562810</td>\n      <td>-1074.464888</td>\n      <td>31.882130</td>\n      <td>775706-base</td>\n    </tr>\n  </tbody>\n</table>\n<p>10 rows × 73 columns</p>\n</div>"
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Index: 100000 entries, 0-query to 99999-query\n",
      "Data columns (total 73 columns):\n",
      " #   Column  Non-Null Count   Dtype  \n",
      "---  ------  --------------   -----  \n",
      " 0   0       100000 non-null  float64\n",
      " 1   1       100000 non-null  float64\n",
      " 2   2       100000 non-null  float64\n",
      " 3   3       100000 non-null  float64\n",
      " 4   4       100000 non-null  float64\n",
      " 5   5       100000 non-null  float64\n",
      " 6   6       100000 non-null  float64\n",
      " 7   7       100000 non-null  float64\n",
      " 8   8       100000 non-null  float64\n",
      " 9   9       100000 non-null  float64\n",
      " 10  10      100000 non-null  float64\n",
      " 11  11      100000 non-null  float64\n",
      " 12  12      100000 non-null  float64\n",
      " 13  13      100000 non-null  float64\n",
      " 14  14      100000 non-null  float64\n",
      " 15  15      100000 non-null  float64\n",
      " 16  16      100000 non-null  float64\n",
      " 17  17      100000 non-null  float64\n",
      " 18  18      100000 non-null  float64\n",
      " 19  19      100000 non-null  float64\n",
      " 20  20      100000 non-null  float64\n",
      " 21  21      100000 non-null  float64\n",
      " 22  22      100000 non-null  float64\n",
      " 23  23      100000 non-null  float64\n",
      " 24  24      100000 non-null  float64\n",
      " 25  25      100000 non-null  float64\n",
      " 26  26      100000 non-null  float64\n",
      " 27  27      100000 non-null  float64\n",
      " 28  28      100000 non-null  float64\n",
      " 29  29      100000 non-null  float64\n",
      " 30  30      100000 non-null  float64\n",
      " 31  31      100000 non-null  float64\n",
      " 32  32      100000 non-null  float64\n",
      " 33  33      100000 non-null  float64\n",
      " 34  34      100000 non-null  float64\n",
      " 35  35      100000 non-null  float64\n",
      " 36  36      100000 non-null  float64\n",
      " 37  37      100000 non-null  float64\n",
      " 38  38      100000 non-null  float64\n",
      " 39  39      100000 non-null  float64\n",
      " 40  40      100000 non-null  float64\n",
      " 41  41      100000 non-null  float64\n",
      " 42  42      100000 non-null  float64\n",
      " 43  43      100000 non-null  float64\n",
      " 44  44      100000 non-null  float64\n",
      " 45  45      100000 non-null  float64\n",
      " 46  46      100000 non-null  float64\n",
      " 47  47      100000 non-null  float64\n",
      " 48  48      100000 non-null  float64\n",
      " 49  49      100000 non-null  float64\n",
      " 50  50      100000 non-null  float64\n",
      " 51  51      100000 non-null  float64\n",
      " 52  52      100000 non-null  float64\n",
      " 53  53      100000 non-null  float64\n",
      " 54  54      100000 non-null  float64\n",
      " 55  55      100000 non-null  float64\n",
      " 56  56      100000 non-null  float64\n",
      " 57  57      100000 non-null  float64\n",
      " 58  58      100000 non-null  float64\n",
      " 59  59      100000 non-null  float64\n",
      " 60  60      100000 non-null  float64\n",
      " 61  61      100000 non-null  float64\n",
      " 62  62      100000 non-null  float64\n",
      " 63  63      100000 non-null  float64\n",
      " 64  64      100000 non-null  float64\n",
      " 65  65      100000 non-null  float64\n",
      " 66  66      100000 non-null  float64\n",
      " 67  67      100000 non-null  float64\n",
      " 68  68      100000 non-null  float64\n",
      " 69  69      100000 non-null  float64\n",
      " 70  70      100000 non-null  float64\n",
      " 71  71      100000 non-null  float64\n",
      " 72  Target  100000 non-null  object \n",
      "dtypes: float64(72), object(1)\n",
      "memory usage: 56.5+ MB\n"
     ]
    },
    {
     "data": {
      "text/plain": "None"
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": "       count        mean         std          min         50%         max\n0   100000.0  -85.328679   25.803845  -186.280270  -85.273695   14.585236\n1   100000.0    7.664345    4.955651   -11.560507    7.652854   28.917845\n2   100000.0  -43.667046   39.111064  -224.896060  -42.830246  128.108460\n3   100000.0 -146.118630   20.434841  -223.307220 -146.067445  -60.751625\n4   100000.0  111.770592   47.700958   -93.272020  112.260100  301.363600\n..       ...         ...         ...          ...         ...         ...\n67  100000.0   23.029277   55.470761  -203.746380   23.441363  266.493320\n68  100000.0   73.412076   62.203132  -181.973820   72.880192  319.867520\n69  100000.0  115.189717   21.582238    22.598862  115.236635  201.761260\n70  100000.0 -709.761548  405.961084 -1297.871984 -808.801696   98.768233\n71  100000.0  -48.505704   41.215124  -209.935760  -48.700929  126.191790\n\n[72 rows x 6 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>count</th>\n      <th>mean</th>\n      <th>std</th>\n      <th>min</th>\n      <th>50%</th>\n      <th>max</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>100000.0</td>\n      <td>-85.328679</td>\n      <td>25.803845</td>\n      <td>-186.280270</td>\n      <td>-85.273695</td>\n      <td>14.585236</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>100000.0</td>\n      <td>7.664345</td>\n      <td>4.955651</td>\n      <td>-11.560507</td>\n      <td>7.652854</td>\n      <td>28.917845</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>100000.0</td>\n      <td>-43.667046</td>\n      <td>39.111064</td>\n      <td>-224.896060</td>\n      <td>-42.830246</td>\n      <td>128.108460</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>100000.0</td>\n      <td>-146.118630</td>\n      <td>20.434841</td>\n      <td>-223.307220</td>\n      <td>-146.067445</td>\n      <td>-60.751625</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>100000.0</td>\n      <td>111.770592</td>\n      <td>47.700958</td>\n      <td>-93.272020</td>\n      <td>112.260100</td>\n      <td>301.363600</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>67</th>\n      <td>100000.0</td>\n      <td>23.029277</td>\n      <td>55.470761</td>\n      <td>-203.746380</td>\n      <td>23.441363</td>\n      <td>266.493320</td>\n    </tr>\n    <tr>\n      <th>68</th>\n      <td>100000.0</td>\n      <td>73.412076</td>\n      <td>62.203132</td>\n      <td>-181.973820</td>\n      <td>72.880192</td>\n      <td>319.867520</td>\n    </tr>\n    <tr>\n      <th>69</th>\n      <td>100000.0</td>\n      <td>115.189717</td>\n      <td>21.582238</td>\n      <td>22.598862</td>\n      <td>115.236635</td>\n      <td>201.761260</td>\n    </tr>\n    <tr>\n      <th>70</th>\n      <td>100000.0</td>\n      <td>-709.761548</td>\n      <td>405.961084</td>\n      <td>-1297.871984</td>\n      <td>-808.801696</td>\n      <td>98.768233</td>\n    </tr>\n    <tr>\n      <th>71</th>\n      <td>100000.0</td>\n      <td>-48.505704</td>\n      <td>41.215124</td>\n      <td>-209.935760</td>\n      <td>-48.700929</td>\n      <td>126.191790</td>\n    </tr>\n  </tbody>\n</table>\n<p>72 rows × 6 columns</p>\n</div>"
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Количество дублированных строк: 0\n"
     ]
    }
   ],
   "source": [
    "# Рассмотрим датасет с тренировочными (для модели градиентного бустинга) данными\n",
    "describe_dataframe(df_train)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "outputs": [
    {
     "data": {
      "text/plain": "                       0          1          2          3           4  \\\nId                                                                      \n100000-query  -57.372734   3.597752 -13.213642 -125.92679  110.745940   \n100001-query  -53.758705  12.790300 -43.268543 -134.41762  114.449910   \n100002-query  -64.175095  -3.980927  -7.679249 -170.16093   96.446160   \n100003-query  -99.286860  16.123936   9.837166 -148.06044   83.697080   \n100004-query  -79.532920  -0.364173 -16.027431 -170.88495  165.453920   \n100005-query  -89.745360   6.317698 -80.744650 -114.53197  153.960340   \n100006-query  -64.514260   7.711647 -28.726236 -220.05089  186.177460   \n100007-query -119.026850   7.536469 -62.973827 -142.94609  150.376110   \n100008-query -100.618990  10.874402 -59.983580 -147.85175   86.138500   \n100009-query  -58.379074  12.812809 -37.571396 -186.67310   89.644264   \n\n                       5           6           7           8           9  ...  \\\nId                                                                        ...   \n100000-query  -81.279594 -461.003172  139.815720  112.880980   75.215750  ...   \n100001-query  -90.520130 -759.626065   63.995087  127.117905   53.128998  ...   \n100002-query  -62.377740 -759.626065   87.477554  131.270110  168.920320  ...   \n100003-query -133.729720   58.576403  -19.046660  115.042404   75.206730  ...   \n100004-query  -28.291668   33.931936   34.411217  128.903980  102.086914  ...   \n100005-query  -74.897130 -208.928691  -32.214005  115.582855   61.603172  ...   \n100006-query  -42.254353   96.324664  -30.496332  109.519530  217.348480  ...   \n100007-query  -92.343550 -530.124724   24.280703  124.623260  119.622160  ...   \n100008-query  -90.452470 -638.720433  111.696850  120.869210  125.254510  ...   \n100009-query  -43.858597 -394.429315   -1.023273  113.776436   26.608970  ...   \n\n                      62         63         64          65        66  \\\nId                                                                     \n100000-query  -75.513020  52.830902 -143.43945   59.051935  69.28224   \n100001-query  -79.441830  29.185436 -168.60590  -82.872443  70.76560   \n100002-query -134.795410  37.368730 -159.66231 -119.232725  67.71044   \n100003-query  -77.236110  44.100494 -132.53012 -106.318982  70.88396   \n100004-query -123.770250  45.635944 -134.25893   13.735359  70.61763   \n100005-query  -64.934890  37.824436 -153.04173 -131.257912  68.26281   \n100006-query  -31.122826  11.672802 -112.34755  183.939297  67.22618   \n100007-query -121.699980  49.379295 -211.29207   37.299723  68.56667   \n100008-query  -68.627860  76.054110 -176.23720  173.788344  68.73241   \n100009-query  -56.836964  69.649020 -206.33858  -26.645270  68.19587   \n\n                      67          68          69           70          71  \nId                                                                         \n100000-query   61.927513  111.592530  115.140656 -1099.130485 -117.079360  \n100001-query  -65.975950   97.077160  123.391640  -744.442332  -25.009320  \n100002-query   86.002060  137.636410  141.081630  -294.052271  -70.969604  \n100003-query   23.577892  133.183960  143.252940  -799.363667  -89.392670  \n100004-query   15.332115  154.568120  101.700640 -1171.892332 -125.307890  \n100005-query   56.239280  120.646900   76.342550 -1156.992950  -72.146390  \n100006-query   65.571060   -6.655426   95.882780 -1176.878727  -37.918420  \n100007-query   21.038134   37.364270  116.667170 -1129.242913  -87.194520  \n100008-query  222.442140  -38.008790  111.531290 -1231.711711  -85.407120  \n100009-query  -41.897000  151.713680  119.582800  -729.465551  -70.948770  \n\n[10 rows x 72 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>0</th>\n      <th>1</th>\n      <th>2</th>\n      <th>3</th>\n      <th>4</th>\n      <th>5</th>\n      <th>6</th>\n      <th>7</th>\n      <th>8</th>\n      <th>9</th>\n      <th>...</th>\n      <th>62</th>\n      <th>63</th>\n      <th>64</th>\n      <th>65</th>\n      <th>66</th>\n      <th>67</th>\n      <th>68</th>\n      <th>69</th>\n      <th>70</th>\n      <th>71</th>\n    </tr>\n    <tr>\n      <th>Id</th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>100000-query</th>\n      <td>-57.372734</td>\n      <td>3.597752</td>\n      <td>-13.213642</td>\n      <td>-125.92679</td>\n      <td>110.745940</td>\n      <td>-81.279594</td>\n      <td>-461.003172</td>\n      <td>139.815720</td>\n      <td>112.880980</td>\n      <td>75.215750</td>\n      <td>...</td>\n      <td>-75.513020</td>\n      <td>52.830902</td>\n      <td>-143.43945</td>\n      <td>59.051935</td>\n      <td>69.28224</td>\n      <td>61.927513</td>\n      <td>111.592530</td>\n      <td>115.140656</td>\n      <td>-1099.130485</td>\n      <td>-117.079360</td>\n    </tr>\n    <tr>\n      <th>100001-query</th>\n      <td>-53.758705</td>\n      <td>12.790300</td>\n      <td>-43.268543</td>\n      <td>-134.41762</td>\n      <td>114.449910</td>\n      <td>-90.520130</td>\n      <td>-759.626065</td>\n      <td>63.995087</td>\n      <td>127.117905</td>\n      <td>53.128998</td>\n      <td>...</td>\n      <td>-79.441830</td>\n      <td>29.185436</td>\n      <td>-168.60590</td>\n      <td>-82.872443</td>\n      <td>70.76560</td>\n      <td>-65.975950</td>\n      <td>97.077160</td>\n      <td>123.391640</td>\n      <td>-744.442332</td>\n      <td>-25.009320</td>\n    </tr>\n    <tr>\n      <th>100002-query</th>\n      <td>-64.175095</td>\n      <td>-3.980927</td>\n      <td>-7.679249</td>\n      <td>-170.16093</td>\n      <td>96.446160</td>\n      <td>-62.377740</td>\n      <td>-759.626065</td>\n      <td>87.477554</td>\n      <td>131.270110</td>\n      <td>168.920320</td>\n      <td>...</td>\n      <td>-134.795410</td>\n      <td>37.368730</td>\n      <td>-159.66231</td>\n      <td>-119.232725</td>\n      <td>67.71044</td>\n      <td>86.002060</td>\n      <td>137.636410</td>\n      <td>141.081630</td>\n      <td>-294.052271</td>\n      <td>-70.969604</td>\n    </tr>\n    <tr>\n      <th>100003-query</th>\n      <td>-99.286860</td>\n      <td>16.123936</td>\n      <td>9.837166</td>\n      <td>-148.06044</td>\n      <td>83.697080</td>\n      <td>-133.729720</td>\n      <td>58.576403</td>\n      <td>-19.046660</td>\n      <td>115.042404</td>\n      <td>75.206730</td>\n      <td>...</td>\n      <td>-77.236110</td>\n      <td>44.100494</td>\n      <td>-132.53012</td>\n      <td>-106.318982</td>\n      <td>70.88396</td>\n      <td>23.577892</td>\n      <td>133.183960</td>\n      <td>143.252940</td>\n      <td>-799.363667</td>\n      <td>-89.392670</td>\n    </tr>\n    <tr>\n      <th>100004-query</th>\n      <td>-79.532920</td>\n      <td>-0.364173</td>\n      <td>-16.027431</td>\n      <td>-170.88495</td>\n      <td>165.453920</td>\n      <td>-28.291668</td>\n      <td>33.931936</td>\n      <td>34.411217</td>\n      <td>128.903980</td>\n      <td>102.086914</td>\n      <td>...</td>\n      <td>-123.770250</td>\n      <td>45.635944</td>\n      <td>-134.25893</td>\n      <td>13.735359</td>\n      <td>70.61763</td>\n      <td>15.332115</td>\n      <td>154.568120</td>\n      <td>101.700640</td>\n      <td>-1171.892332</td>\n      <td>-125.307890</td>\n    </tr>\n    <tr>\n      <th>100005-query</th>\n      <td>-89.745360</td>\n      <td>6.317698</td>\n      <td>-80.744650</td>\n      <td>-114.53197</td>\n      <td>153.960340</td>\n      <td>-74.897130</td>\n      <td>-208.928691</td>\n      <td>-32.214005</td>\n      <td>115.582855</td>\n      <td>61.603172</td>\n      <td>...</td>\n      <td>-64.934890</td>\n      <td>37.824436</td>\n      <td>-153.04173</td>\n      <td>-131.257912</td>\n      <td>68.26281</td>\n      <td>56.239280</td>\n      <td>120.646900</td>\n      <td>76.342550</td>\n      <td>-1156.992950</td>\n      <td>-72.146390</td>\n    </tr>\n    <tr>\n      <th>100006-query</th>\n      <td>-64.514260</td>\n      <td>7.711647</td>\n      <td>-28.726236</td>\n      <td>-220.05089</td>\n      <td>186.177460</td>\n      <td>-42.254353</td>\n      <td>96.324664</td>\n      <td>-30.496332</td>\n      <td>109.519530</td>\n      <td>217.348480</td>\n      <td>...</td>\n      <td>-31.122826</td>\n      <td>11.672802</td>\n      <td>-112.34755</td>\n      <td>183.939297</td>\n      <td>67.22618</td>\n      <td>65.571060</td>\n      <td>-6.655426</td>\n      <td>95.882780</td>\n      <td>-1176.878727</td>\n      <td>-37.918420</td>\n    </tr>\n    <tr>\n      <th>100007-query</th>\n      <td>-119.026850</td>\n      <td>7.536469</td>\n      <td>-62.973827</td>\n      <td>-142.94609</td>\n      <td>150.376110</td>\n      <td>-92.343550</td>\n      <td>-530.124724</td>\n      <td>24.280703</td>\n      <td>124.623260</td>\n      <td>119.622160</td>\n      <td>...</td>\n      <td>-121.699980</td>\n      <td>49.379295</td>\n      <td>-211.29207</td>\n      <td>37.299723</td>\n      <td>68.56667</td>\n      <td>21.038134</td>\n      <td>37.364270</td>\n      <td>116.667170</td>\n      <td>-1129.242913</td>\n      <td>-87.194520</td>\n    </tr>\n    <tr>\n      <th>100008-query</th>\n      <td>-100.618990</td>\n      <td>10.874402</td>\n      <td>-59.983580</td>\n      <td>-147.85175</td>\n      <td>86.138500</td>\n      <td>-90.452470</td>\n      <td>-638.720433</td>\n      <td>111.696850</td>\n      <td>120.869210</td>\n      <td>125.254510</td>\n      <td>...</td>\n      <td>-68.627860</td>\n      <td>76.054110</td>\n      <td>-176.23720</td>\n      <td>173.788344</td>\n      <td>68.73241</td>\n      <td>222.442140</td>\n      <td>-38.008790</td>\n      <td>111.531290</td>\n      <td>-1231.711711</td>\n      <td>-85.407120</td>\n    </tr>\n    <tr>\n      <th>100009-query</th>\n      <td>-58.379074</td>\n      <td>12.812809</td>\n      <td>-37.571396</td>\n      <td>-186.67310</td>\n      <td>89.644264</td>\n      <td>-43.858597</td>\n      <td>-394.429315</td>\n      <td>-1.023273</td>\n      <td>113.776436</td>\n      <td>26.608970</td>\n      <td>...</td>\n      <td>-56.836964</td>\n      <td>69.649020</td>\n      <td>-206.33858</td>\n      <td>-26.645270</td>\n      <td>68.19587</td>\n      <td>-41.897000</td>\n      <td>151.713680</td>\n      <td>119.582800</td>\n      <td>-729.465551</td>\n      <td>-70.948770</td>\n    </tr>\n  </tbody>\n</table>\n<p>10 rows × 72 columns</p>\n</div>"
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Index: 100000 entries, 100000-query to 199999-query\n",
      "Data columns (total 72 columns):\n",
      " #   Column  Non-Null Count   Dtype  \n",
      "---  ------  --------------   -----  \n",
      " 0   0       100000 non-null  float64\n",
      " 1   1       100000 non-null  float64\n",
      " 2   2       100000 non-null  float64\n",
      " 3   3       100000 non-null  float64\n",
      " 4   4       100000 non-null  float64\n",
      " 5   5       100000 non-null  float64\n",
      " 6   6       100000 non-null  float64\n",
      " 7   7       100000 non-null  float64\n",
      " 8   8       100000 non-null  float64\n",
      " 9   9       100000 non-null  float64\n",
      " 10  10      100000 non-null  float64\n",
      " 11  11      100000 non-null  float64\n",
      " 12  12      100000 non-null  float64\n",
      " 13  13      100000 non-null  float64\n",
      " 14  14      100000 non-null  float64\n",
      " 15  15      100000 non-null  float64\n",
      " 16  16      100000 non-null  float64\n",
      " 17  17      100000 non-null  float64\n",
      " 18  18      100000 non-null  float64\n",
      " 19  19      100000 non-null  float64\n",
      " 20  20      100000 non-null  float64\n",
      " 21  21      100000 non-null  float64\n",
      " 22  22      100000 non-null  float64\n",
      " 23  23      100000 non-null  float64\n",
      " 24  24      100000 non-null  float64\n",
      " 25  25      100000 non-null  float64\n",
      " 26  26      100000 non-null  float64\n",
      " 27  27      100000 non-null  float64\n",
      " 28  28      100000 non-null  float64\n",
      " 29  29      100000 non-null  float64\n",
      " 30  30      100000 non-null  float64\n",
      " 31  31      100000 non-null  float64\n",
      " 32  32      100000 non-null  float64\n",
      " 33  33      100000 non-null  float64\n",
      " 34  34      100000 non-null  float64\n",
      " 35  35      100000 non-null  float64\n",
      " 36  36      100000 non-null  float64\n",
      " 37  37      100000 non-null  float64\n",
      " 38  38      100000 non-null  float64\n",
      " 39  39      100000 non-null  float64\n",
      " 40  40      100000 non-null  float64\n",
      " 41  41      100000 non-null  float64\n",
      " 42  42      100000 non-null  float64\n",
      " 43  43      100000 non-null  float64\n",
      " 44  44      100000 non-null  float64\n",
      " 45  45      100000 non-null  float64\n",
      " 46  46      100000 non-null  float64\n",
      " 47  47      100000 non-null  float64\n",
      " 48  48      100000 non-null  float64\n",
      " 49  49      100000 non-null  float64\n",
      " 50  50      100000 non-null  float64\n",
      " 51  51      100000 non-null  float64\n",
      " 52  52      100000 non-null  float64\n",
      " 53  53      100000 non-null  float64\n",
      " 54  54      100000 non-null  float64\n",
      " 55  55      100000 non-null  float64\n",
      " 56  56      100000 non-null  float64\n",
      " 57  57      100000 non-null  float64\n",
      " 58  58      100000 non-null  float64\n",
      " 59  59      100000 non-null  float64\n",
      " 60  60      100000 non-null  float64\n",
      " 61  61      100000 non-null  float64\n",
      " 62  62      100000 non-null  float64\n",
      " 63  63      100000 non-null  float64\n",
      " 64  64      100000 non-null  float64\n",
      " 65  65      100000 non-null  float64\n",
      " 66  66      100000 non-null  float64\n",
      " 67  67      100000 non-null  float64\n",
      " 68  68      100000 non-null  float64\n",
      " 69  69      100000 non-null  float64\n",
      " 70  70      100000 non-null  float64\n",
      " 71  71      100000 non-null  float64\n",
      "dtypes: float64(72)\n",
      "memory usage: 55.7+ MB\n"
     ]
    },
    {
     "data": {
      "text/plain": "None"
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": "       count        mean         std          min         50%         max\n0   100000.0  -85.302233   25.777321  -190.353330  -85.296745   14.427986\n1   100000.0    7.669724    4.956990   -11.109877    7.657888   27.409784\n2   100000.0  -43.842474   39.138775  -217.538420  -43.230835  134.859800\n3   100000.0 -146.119797   20.495541  -220.050890 -146.080365  -57.381890\n4   100000.0  111.635071   47.751576   -81.198990  111.959330  302.065370\n..       ...         ...         ...          ...         ...         ...\n67  100000.0   23.250779   55.403862  -210.672800   23.508739  251.288590\n68  100000.0   73.114446   62.056224  -175.921780   72.152398  305.937530\n69  100000.0  115.196935   21.493081    25.271042  115.280990  201.599980\n70  100000.0 -709.457021  405.665764 -1297.923999 -807.029697   98.737079\n71  100000.0  -48.416276   41.292843  -209.935760  -48.670001  111.831955\n\n[72 rows x 6 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>count</th>\n      <th>mean</th>\n      <th>std</th>\n      <th>min</th>\n      <th>50%</th>\n      <th>max</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>100000.0</td>\n      <td>-85.302233</td>\n      <td>25.777321</td>\n      <td>-190.353330</td>\n      <td>-85.296745</td>\n      <td>14.427986</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>100000.0</td>\n      <td>7.669724</td>\n      <td>4.956990</td>\n      <td>-11.109877</td>\n      <td>7.657888</td>\n      <td>27.409784</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>100000.0</td>\n      <td>-43.842474</td>\n      <td>39.138775</td>\n      <td>-217.538420</td>\n      <td>-43.230835</td>\n      <td>134.859800</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>100000.0</td>\n      <td>-146.119797</td>\n      <td>20.495541</td>\n      <td>-220.050890</td>\n      <td>-146.080365</td>\n      <td>-57.381890</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>100000.0</td>\n      <td>111.635071</td>\n      <td>47.751576</td>\n      <td>-81.198990</td>\n      <td>111.959330</td>\n      <td>302.065370</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>67</th>\n      <td>100000.0</td>\n      <td>23.250779</td>\n      <td>55.403862</td>\n      <td>-210.672800</td>\n      <td>23.508739</td>\n      <td>251.288590</td>\n    </tr>\n    <tr>\n      <th>68</th>\n      <td>100000.0</td>\n      <td>73.114446</td>\n      <td>62.056224</td>\n      <td>-175.921780</td>\n      <td>72.152398</td>\n      <td>305.937530</td>\n    </tr>\n    <tr>\n      <th>69</th>\n      <td>100000.0</td>\n      <td>115.196935</td>\n      <td>21.493081</td>\n      <td>25.271042</td>\n      <td>115.280990</td>\n      <td>201.599980</td>\n    </tr>\n    <tr>\n      <th>70</th>\n      <td>100000.0</td>\n      <td>-709.457021</td>\n      <td>405.665764</td>\n      <td>-1297.923999</td>\n      <td>-807.029697</td>\n      <td>98.737079</td>\n    </tr>\n    <tr>\n      <th>71</th>\n      <td>100000.0</td>\n      <td>-48.416276</td>\n      <td>41.292843</td>\n      <td>-209.935760</td>\n      <td>-48.670001</td>\n      <td>111.831955</td>\n    </tr>\n  </tbody>\n</table>\n<p>72 rows × 6 columns</p>\n</div>"
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Количество дублированных строк: 0\n"
     ]
    }
   ],
   "source": [
    "# Рассмотрим датасет с валидационными данными\n",
    "describe_dataframe(df_validation)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "outputs": [
    {
     "data": {
      "text/plain": "                  Expected\nId                        \n100000-query  2676668-base\n100001-query    91606-base\n100002-query   472256-base\n100003-query  3168654-base\n100004-query    75484-base\n100005-query  1905037-base\n100006-query   306584-base\n100007-query  1533713-base\n100008-query  2796017-base\n100009-query  1777304-base",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>Expected</th>\n    </tr>\n    <tr>\n      <th>Id</th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>100000-query</th>\n      <td>2676668-base</td>\n    </tr>\n    <tr>\n      <th>100001-query</th>\n      <td>91606-base</td>\n    </tr>\n    <tr>\n      <th>100002-query</th>\n      <td>472256-base</td>\n    </tr>\n    <tr>\n      <th>100003-query</th>\n      <td>3168654-base</td>\n    </tr>\n    <tr>\n      <th>100004-query</th>\n      <td>75484-base</td>\n    </tr>\n    <tr>\n      <th>100005-query</th>\n      <td>1905037-base</td>\n    </tr>\n    <tr>\n      <th>100006-query</th>\n      <td>306584-base</td>\n    </tr>\n    <tr>\n      <th>100007-query</th>\n      <td>1533713-base</td>\n    </tr>\n    <tr>\n      <th>100008-query</th>\n      <td>2796017-base</td>\n    </tr>\n    <tr>\n      <th>100009-query</th>\n      <td>1777304-base</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Index: 100000 entries, 100000-query to 199999-query\n",
      "Data columns (total 1 columns):\n",
      " #   Column    Non-Null Count   Dtype \n",
      "---  ------    --------------   ----- \n",
      " 0   Expected  100000 non-null  object\n",
      "dtypes: object(1)\n",
      "memory usage: 1.5+ MB\n"
     ]
    },
    {
     "data": {
      "text/plain": "None"
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": "           count unique          top freq\nExpected  100000  91502  210304-base    7",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>count</th>\n      <th>unique</th>\n      <th>top</th>\n      <th>freq</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>Expected</th>\n      <td>100000</td>\n      <td>91502</td>\n      <td>210304-base</td>\n      <td>7</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Количество дублированных строк: 8498\n"
     ]
    }
   ],
   "source": [
    "# Рассмотрим датасет с ответами на валидационные данные\n",
    "describe_dataframe(df_validation_answer)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Выводы: Записи во всех 4х датасетах имеют одинаковую размерность, не имеют пропусков, не имеют явных выбросов. Данные являются готовыми эмбэдингами для векторных баз данных, поэтому дальнейшее их рассмотрение по моему мнению не имеет особого смысла, распределения и тд не повлияет на модель обучения градиентного бустинга или модели поиска ближайших соседей. Также стоит отметить, что в ответах к валидационным данными присутствуют дубликаты, что говорит о том, что для различных векторов может быть один и тот же \"самый похожий вектор\"."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Модель для поиска ближайших соседей"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### FAISS для поиска ближайших соседей."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "FAISS (Facebook AI Similarity Search) - это библиотека, разработанная Facebook AI Research, которая предоставляет эффективные алгоритмы для поиска ближайших соседей в больших наборах данных. Она основана на индексации векторных представлений и может использоваться для различных задач, включая сопоставление лиц, семантический поиск и рекомендательные системы.\n",
    "\n",
    "Данная модель отлично подойдёт для текущей задачи."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Скейлинг данных"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Однако, перед обучением и применением модели для поиска ближайших соседей стоит произвести скейлинг данных (привести их к нормальному распределению с одинаковой дисперсией). Для данной задачи воспользуемся StandardScaler из библиотеки sklearn."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "outputs": [
    {
     "data": {
      "text/plain": "StandardScaler()",
      "text/html": "<style>#sk-container-id-1 {color: black;background-color: white;}#sk-container-id-1 pre{padding: 0;}#sk-container-id-1 div.sk-toggleable {background-color: white;}#sk-container-id-1 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-1 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-1 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-1 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-1 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-1 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-1 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-1 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-1 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-1 div.sk-item {position: relative;z-index: 1;}#sk-container-id-1 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-1 div.sk-item::before, #sk-container-id-1 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-1 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-1 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-1 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-1 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-1 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-1 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-1 div.sk-label-container {text-align: center;}#sk-container-id-1 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-1 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-1\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>StandardScaler()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-1\" type=\"checkbox\" checked><label for=\"sk-estimator-id-1\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">StandardScaler</label><div class=\"sk-toggleable__content\"><pre>StandardScaler()</pre></div></div></div></div></div>"
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Создаём и обучаем экземпляр StandardScaler\n",
    "scaler = StandardScaler()\n",
    "scaler.fit(df_base)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "outputs": [],
   "source": [
    "df_base_scaled = pd.DataFrame(scaler.transform(df_base), index=df_base.index)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "outputs": [
    {
     "data": {
      "text/plain": "array([[-1.1592162 ,  0.6203504 , -0.51372266, ..., -0.0140507 ,\n         1.7814198 , -0.3123287 ],\n       [ 2.0757148 ,  1.0604233 , -0.652491  , ...,  0.05984761,\n         1.8537259 , -0.2810519 ],\n       [ 1.2854173 , -0.34334213,  0.39787757, ...,  0.04852086,\n        -0.7138468 ,  0.36562327],\n       ...,\n       [-0.43377605, -2.064035  , -0.6909693 , ...,  0.65973157,\n        -0.7138468 ,  1.2577736 ],\n       [-0.02446461,  0.16793925,  0.25220424, ...,  0.43807346,\n        -0.7138468 , -0.19157949],\n       [-0.6321802 ,  0.96488   , -0.17634064, ..., -1.0735649 ,\n        -0.7121896 ,  1.4986584 ]], dtype=float32)"
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dims = df_base.shape[1]\n",
    "n_cells = 1\n",
    "df_base_scaled_values = np.ascontiguousarray(df_base_scaled).astype('float32')\n",
    "df_base_scaled_values"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "outputs": [],
   "source": [
    "# Инициализация индексатора FAISS\n",
    "quantizer = faiss.IndexFlatL2(dims)\n",
    "idx_l2 = faiss.IndexIVFFlat(quantizer, dims, n_cells)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "outputs": [],
   "source": [
    "# Обучаем индексатор и добавляем в него все данные из базового датасета\n",
    "idx_l2.train(df_base_scaled_values)\n",
    "\n",
    "idx_l2.add(df_base_scaled_values)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Подбор оптимального количества кластеров"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Вопрос - какое количество кластеров необходимо и достаточно для оптимального решения данной нам задачи?\n",
    "Я считаю, что в данном случае нас интересует именно точность подбора похожих товаров. Поэтому построим график потерь для каждого количества кластеров."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "outputs": [
    {
     "data": {
      "text/plain": "<Figure size 640x480 with 1 Axes>",
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAHACAYAAABeV0mSAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAABPoUlEQVR4nO3dd3wUdeLG8c+mbQopBEhCQiAovYWaUEQPRRERDysnCAGsJ9iinmABG01PDj1QTqUKKOqJcgdiQQHBSAkEQXpNKAmEkk0hm2R3fn+g+7sIaIhJZrN53q/XvHBnZ2afzUL2ceY7MxbDMAxEREREPISX2QFEREREKpLKjYiIiHgUlRsRERHxKCo3IiIi4lFUbkRERMSjqNyIiIiIR1G5EREREY+iciMiIiIeReVGREREPIrKjYiIiHiUGl1uVq9eTf/+/YmOjsZisfDpp59e0vrPP/88FovlvCkoKKhyAouIiMjvqtHlJj8/n/j4eKZPn16u9Z944gmOHTtWamrVqhW33357BScVERGRsqrR5aZv3768/PLL3HzzzRd83m6388QTTxATE0NQUBCJiYmsXLnS9XytWrWIiopyTVlZWWzfvp277767it6BiIiI/FqNLje/Z9SoUaSkpPDBBx/w448/cvvtt3P99dezZ8+eCy7/7rvv0qxZM3r27FnFSUVEROQXKjcXkZ6ezuzZs/noo4/o2bMnl19+OU888QRXXHEFs2fPPm/5wsJCFixYoL02IiIiJvMxO4C72rp1Kw6Hg2bNmpWab7fbqVOnznnLL168mNzcXJKSkqoqooiIiFyAys1F5OXl4e3tTWpqKt7e3qWeq1Wr1nnLv/vuu9x4441ERkZWVUQRERG5AJWbi+jQoQMOh4Pjx4//7hiaAwcO8O2337JkyZIqSiciIiIXU6PLTV5eHnv37nU9PnDgAGlpaYSHh9OsWTMGDx7M0KFDee211+jQoQMnTpxgxYoVtGvXjn79+rnWmzVrFvXr16dv375mvA0RERH5HxbDMAyzQ5hl5cqV9OrV67z5SUlJzJkzh+LiYl5++WXmzZvHkSNHqFu3Ll27duWFF16gbdu2ADidTho1asTQoUMZP358Vb8FERER+ZUaXW5ERETE8+hUcBEREfEoKjciIiLiUWrcgGKn08nRo0cJDg7GYrGYHUdERETKwDAMcnNziY6Oxsvrt/fN1Lhyc/ToUWJjY82OISIiIuWQkZFBgwYNfnOZGldugoODgXM/nJCQEJPTiIiISFnYbDZiY2Nd3+O/pcaVm18ORYWEhKjciIiIVDNlGVKiAcUiIiLiUUwtN6tXr6Z///5ER0djsVj49NNPf3edBQsWEB8fT2BgIPXr12fEiBGcPHmy8sOKiIhItWBqucnPzyc+Pp7p06eXafm1a9cydOhQ7r77bn766Sc++ugj1q9fz7333lvJSUVERKS6MHXMTd++fS/pfkwpKSnExcXx8MMPA9C4cWPuv/9+Jk+eXFkRRUREpJqpVmNuunXrRkZGBsuWLcMwDLKysvj444+54YYbzI4mIiIibqJalZsePXqwYMECBg4ciJ+fH1FRUYSGhv7mYS273Y7NZis1iYiIiOeqVuVm+/btPPLII4wdO5bU1FSWL1/OwYMHeeCBBy66zsSJEwkNDXVNuoCfiIiIZ3Obu4JbLBYWL17MgAEDLrrMkCFDKCws5KOPPnLNW7NmDT179uTo0aPUr1//vHXsdjt2u931+JeLAOXk5Og6NyIiItWEzWYjNDS0TN/f1eoifgUFBfj4lI7s7e0NnLvnxIVYrVasVmulZxMRERH3YOphqby8PNLS0khLSwPgwIEDpKWlkZ6eDsCYMWMYOnSoa/n+/fvzySef8NZbb7F//37Wrl3Lww8/TEJCAtHR0Wa8BREREXEzpu652bhxI7169XI9Tk5OBiApKYk5c+Zw7NgxV9EBGDZsGLm5uUybNo3HH3+csLAwrr76ap0KLiIiIi5uM+amqlzKMTsRERFxD5fy/V2tzpYSERER97Yp/TQncu2/v2AlUrkRERGRCpGy7ySD31nHXe+u41R+kWk5qtXZUiIiIuKe1uzJ5p55GygsdhIRYiXQz9u0LCo3IiIi8oes3HWc+95LpajESa/m9Xjrrk74+6rciIiISDW0YkcWf52/iSKHk94tI5k+uANWH/OKDajciIiISDl98VMmoxZuothhcH3rKN64swN+PuYP51W5ERERkUu2bOsxHn5/MyVOg37t6jN1YHt8vc0vNqCzpUREROQSLdlylId+LjYD2kfzuhsVG9CeGxEREbkEizcf5vEPt+A04LZODZh8azu8vSxmxyrFfWqWiIiIuLUPN2aQ/HOx+UuXWF5xw2ID2nMjIiIiZbBwXTpPL94KwF1dG/LiTW3wcsNiAyo3IiIi8jvmpRxk7Gc/ATCsexzj+rfCYnHPYgMqNyIiIvIbZq05wIv/3Q7AvT0b8/QNLd262IDKjYiIiFzE26v3MWHZTgD++qfL+Vuf5m5fbEDlRkRERC5g+rd7efWLXQA8fHUTHru2WbUoNqByIyIiIr/y+td7+MfXuwFIvrYZD1/T1OREl0blRkRERAAwDIMpX+3mn9/sBeBv1zfnwT81MTnVpVO5EREREQzDYPLyXcxYtQ+AZ25oyb1XXmZyqvJRuREREanhDMPg5aU7mLnmAADj+rdieI/GJqcqP5UbERGRGswwDF74z3bmfH8QgJcGtGFI10bmhvqDVG5ERERqKKfT4LnPtrFgXToWC0y4uS13JjQ0O9YfpnIjIiJSAzmdBmM+2cqijRlYLPDKre24vXOs2bEqhMqNiIhIDeNwGjz58RY+2XQELwu8dkc8N3doYHasCqNyIyIiUoOUOJw8/tEWPks7ireXhakD29M/PtrsWBVK5UZERKSGKHY4eXRRGkt/PIaPl4V/3tmBvm3rmx2rwqnciIiI1ABFJU4een8TX/yUha+3hemDOnJd6yizY1UKlRsREREPZy9xMHLBJr7ecRw/by9mDOnI1S0izY5VaVRuREREPFhhsYO/zk/l210nsPp48fbQzlzVrJ7ZsSqVyo2IiIiHOlvk4L73NvLdnmz8fb2YmdSFHk3qmh2r0qnciIiIeKCCohLunrORlP0nCfTzZtawLnS9rI7ZsaqEyo2IiIiHybOXMGLOBtYfOEWQnzdzRiTQJS7c7FhVRuVGRETEg+QWFjNs9gZSD50m2OrD3LsT6NiwttmxqpTKjYiIiIfIOVtM0qz1pGWcIcTfh/fuTiQ+NszsWFVO5UZERMQDnCkoYsjM9Ww9kkNYoC/z706kTUyo2bFM4WXmi69evZr+/fsTHR2NxWLh008//d117HY7zzzzDI0aNcJqtRIXF8esWbMqP6yIiIibOpVfxKB31rH1SA7hQX4svKdrjS02YPKem/z8fOLj4xkxYgS33HJLmda54447yMrKYubMmTRp0oRjx47hdDorOamIiIh7ys6zc9e769iZmUvdWn4suKcrzaOCzY5lKlPLTd++fenbt2+Zl1++fDmrVq1i//79hIefG/UdFxdXSelERETc2/HcQga/s449x/OICLay8N6uNImoZXYs05l6WOpSLVmyhM6dO/PKK68QExNDs2bNeOKJJzh79qzZ0URERKpUlq2Qv7z9A3uO5xEV4s+i+7up2PysWg0o3r9/P2vWrMHf35/FixeTnZ3Ngw8+yMmTJ5k9e/YF17Hb7djtdtdjm81WVXFFREQqxdEzZxn0zg8cPFlATFgAC+9NpFGdILNjuY1qtefG6XRisVhYsGABCQkJ3HDDDUyZMoW5c+dedO/NxIkTCQ0NdU2xsbFVnFpERKTiHD5dwMC3Uzh4soAGtQP44L6uKja/Uq3KTf369YmJiSE09P9HgLds2RLDMDh8+PAF1xkzZgw5OTmuKSMjo6riioiIVKj0kwUM/NcPZJw6S6M6gSy6vxux4YFmx3I71arc9OjRg6NHj5KXl+eat3v3bry8vGjQoMEF17FarYSEhJSaREREqpsD2fkMfDuFI2fOclndIBbd142YsACzY7klU8tNXl4eaWlppKWlAXDgwAHS0tJIT08Hzu11GTp0qGv5QYMGUadOHYYPH8727dtZvXo1Tz75JCNGjCAgQB+wiIh4pn0n8hj4rxSO5RTSJKIWH9zXlahQf7NjuS1TBxRv3LiRXr16uR4nJycDkJSUxJw5czh27Jir6ADUqlWLr776ioceeojOnTtTp04d7rjjDl5++eUqzy4iIuZzOg2+3pGFr7cXseEBNKgdiL+vt9mxKtSerFzufGcd2Xl2mkcGM/+eROoFW82O5dYshmEYZoeoSjabjdDQUHJycnSISkSkmntn9X7GL9tRal7dWlYa1A4gNjzw3J+1A12Po8P8sfpUn/KzM9PG4HfWcTK/iJb1Q1hwTyLhQX5mxzLFpXx/V6tTwUVERH6RnWfnjRV7AIirE0h2XhF59hKy8+xk59lJyzhz3joWC0QG+1+0/ESF+uPr7R7DUX86msNd767jdEExbWJCmH93ImGBNbPYXCqVGxERqZb+8dVucu0ltIkJYcnIK7BYzt0V+/Dps2ScKjj35+mCUo/PFjvItBWSaStk46HT523TywL1QwMuWH4ahAcSFeKPt5el0t/b1sM53DVzHTlni4lvEMq8EYmEBvpW+ut6CpUbERGpdnZl5vL++nNjMsfe2BqvnwtHWKAfYYF+F7xppGEYnMwvumD5Ofzzn0UlTo6cOcuRM2dZd+DUedvw8bIQHRZwbnxPWKBrnM8vf9arZXVlKa/N6acZOms9uYUldGwYxpwRCYT4q9hcCpUbERGpVgzD4OWl23Ea0LdNFAmNw8u0nsVioW4tK3VrWWkfG3be806nQXae/by9Pb88PnL6LCVOg/RTBaSfKgBOnrcNPx8vGoSd28vz60NeDWoHUCfID4vl4uVn48FTDJu9gTx7CQlx4cwa3oVaVn1VXyr9xEREpFr5dtdxvtuTjZ+3F2P6tqyw7Xp5WYgI8ScixJ9Ojc5/3uE0yLIVXnCvT8apsxzLObfnZ392Pvuz8y/4GgG+3hc+5FU7kJP5dh5csImCIgddLwtn1rAuBPrpa7o89FMTEZFqo9jh5OWl586OGn5FHA3rVN3Veb1/PiQVHRZA4kWyZeYUnis9p34uPf9TfrJyCzlb7GDP8Tz2HM+7wBbOuaJJXd4Z2pkAv+pzVpe7UbkREZFqY/4Ph9h/Ip86QX6M6tXE7DilnLvWTuC52yFcfv7z9hIHR88Uusb3/HoP0Mk8O9e3iWLKHe097lo9VU3lRkREqoUzBUVM/frcqd/J1zUjuJoNsrX6eNO4bhCN6174JpclDic+bnIaenWnn6KIiFQLr6/YQ87ZYlpEBTOwc6zZcSqcik3F0U9SRETc3r4TebyXcgiAZ/u1UhGQ36S/HSIi4vYmLN1BidPgmhYRXNG0rtlxxM2p3IiIiFtbsyebFTuP4+Nl4el+FXfqt3gulRsREXFbJQ4nL/13OwBDujXi8nq1TE4k1YHKjYiIuK1FGzPYlZVLaIAvj1zT1Ow4Uk2o3IiIiFuyFRYz5cvdADzau6nuiC1lpnIjIiJuafq3ezmZX8Rl9YK4q+sF7ocgchEqNyIi4nbSTxYwe81BAJ7t1xJfnfotl0B/W0RExO1M/HwHRQ4nPZvWpVfzCLPjSDWjciMiIm5l3f6TfL4tEy/LuQv2WSwWsyNJNaNyIyIibsPpNHhp6blTv+9MaEjzqGCTE0l1pHIjIiJu49+bDrPtiI1gqw/J1zYzO45UUyo3IiLiFvLtJbz6xS4ARl3dhDq1rCYnkupK5UZERNzCv1bt43iunYbhgQzrEWd2HKnGVG5ERMR0R86c5V+r9wPw9A0tsPp4m5xIqjOVGxERMd0ry3diL3GS2DicPq2jzI4j1ZzKjYiImGpT+mk+SzuKxQLP3ahTv+WPU7kRERHTGIbhuuv3bR0b0CYm1ORE4glUbkRExDRLthxlc/oZAv28ebJPc7PjiIdQuREREVMUFjuY/PlOAP561eVEhPibnEg8hcqNiIiY4p3V+zmaU0h0qD/3XnmZ2XHEg6jciIhIlcuyFfLWqn0APNW3Bf6+OvVbKo7KjYiIVLm/f7GLgiIHHRqGcVN8tNlxxMOo3IiISJXadiSHjzcdBnTqt1QOlRsREakyhmHw4n+3Yxjw5/bRdGxY2+xI4oFMLTerV6+mf//+REdHY7FY+PTTT8u87tq1a/Hx8aF9+/aVlk9ERCrWFz9lsv7AKaw+Xvzt+hZmxxEPZWq5yc/PJz4+nunTp1/SemfOnGHo0KFcc801lZRMREQqmr3EwYRl5079vv/Ky4gJCzA5kXgqHzNfvG/fvvTt2/eS13vggQcYNGgQ3t7el7S3R0REzDNn7UHSTxUQEWzl/qsuNzuOeLBqN+Zm9uzZ7N+/n3HjxpVpebvdjs1mKzWJiEjVys6zM+2bvQA82ac5QVZT/99aPFy1Kjd79uxh9OjRzJ8/Hx+fsv3DmDhxIqGhoa4pNja2klOKiMiv/eOr3eTaS2gTE8KtHRuYHUc8XLUpNw6Hg0GDBvHCCy/QrFmzMq83ZswYcnJyXFNGRkYlphQRkV/bmWnj/fXpADzXrxVeXjr1WypXtdkvmJuby8aNG9m8eTOjRo0CwOl0YhgGPj4+fPnll1x99dXnrWe1WrFarVUdV0REOHfq9/ilO3Aa0LdNFImX1TE7ktQA1abchISEsHXr1lLz3nzzTb755hs+/vhjGjdubFIyERG5mG93Hee7Pdn4eXsxpm9Ls+NIDWFqucnLy2Pv3r2uxwcOHCAtLY3w8HAaNmzImDFjOHLkCPPmzcPLy4s2bdqUWj8iIgJ/f//z5ouIiPmKHU5eXroDgOFXxNGwTqDJiaSmMLXcbNy4kV69erkeJycnA5CUlMScOXM4duwY6enpZsUTEZE/YP4Ph9h/Ip86QX6M6tXE7DhSg1gMwzDMDlGVbDYboaGh5OTkEBISYnYcERGPdKagiKteXUnO2WLG39yGwYmNzI4k1dylfH9Xm7OlRESk+nh9xR5yzhbTPDKYgZ11CQ6pWio3IiJSofadyOO9lEMAPHtjS3y89VUjVUt/40REpEJNWLqDEqfBNS0i6Nm0ntlxpAZSuRERkQrz3Z4TrNh5HB8vC0/306nfYg6VGxERqRAlDicv//fcqd9DujXi8nq1TE4kNZXKjYiIVIhFGzPYlZVLaIAvj1zT1Ow4UoOp3IiIyB9mKyxmype7AXi0d1PCAv1MTiQ1mcqNiIj8YdO/2cvJ/CIuqxfEXV11TRsxl8qNiIj8IeknC5i99iAAz/Zria9O/RaT6W+giIj8IRM/30GRw0nPpnXp1TzC7DgiKjciIlJ+P+w/yefbMvGywLP9WmGxWMyOJKJyIyIi5eN0Gry8dDsAdyY0pHlUsMmJRM5RuRERkXL596bDbDtiI9jqQ/K1zcyOI+KiciMiIpcs317Cq1/sAmDU1U2oU8tqciKR/6dyIyIil2zGqn0cz7XTMDyQYT3izI4jUorKjYiIXJIjZ87y9ur9ADx9QwusPt4mJxIpTeVGREQuySvLd2IvcZLYOJw+raPMjiNyHpUbEREps03pp/ks7SgWCzx3o079FvekciMiImViGAYv/ffcqd+3dWxAm5hQkxOJXJjKjYiIlMmSLUfZnH6GQD9vnuzT3Ow4IhelciMiIr/rbJGDyZ/vBOCvV11ORIi/yYlELk7lRkREfte73+3naE4h0aH+3HvlZWbHEflNKjciIvKbsmyFvLVqHwBP9W2Bv69O/Rb3pnIjIiK/6e9f7KKgyEGHhmHcFB9tdhyR36VyIyIiF7XtSA4fbzoM6NRvqT5UbkRE5IIMw+DF/27HMODP7aPp2LC22ZFEykTlRkRELuiLnzJZf+AUVh8v/nZ9C7PjiJSZyo2IiJzHXuJgwrJzp37fd+VlxIQFmJxIpOxUbkRE5Dxz1h4k/VQBEcFWHrjqcrPjiFwSlRsRESklO8/OtG/2AvBkn+YEWX1MTiRyaVRuRESklClf7SbXXkKbmBBu7djA7Dgil0zlRkREXHZm2vhgfToAz/VrhZeXTv2W6kflRkREgHOnfo9fugOnAX3bRJF4WR2zI4mUi8qNiIgA8O2u43y3Jxs/by/G9G1pdhyRcjO13KxevZr+/fsTHR2NxWLh008//c3lP/nkE6699lrq1atHSEgI3bp144svvqiasCIiHqzY4eTlpTsAGN4jjoZ1Ak1OJFJ+ppab/Px84uPjmT59epmWX716Nddeey3Lli0jNTWVXr160b9/fzZv3lzJSUVEPNv8Hw6x/0Q+dYL8GHl1E7PjiPwhpp7f17dvX/r27Vvm5adOnVrq8YQJE/jss8/4z3/+Q4cOHSo4nYhIzXCmoIipX+8BIPm6ZoT4+5qcSOSPqdYXL3A6neTm5hIeHn7RZex2O3a73fXYZrNVRTQRkWpj6td7yDlbTPPIYAZ2jjU7jsgfVq0HFP/9738nLy+PO+6446LLTJw4kdDQUNcUG6t/uCIiv9h3Io/5PxwC4NkbW+LjXa2/FkSAalxuFi5cyAsvvMCHH35IRETERZcbM2YMOTk5rikjI6MKU4qIuLcJS3dQ4jS4pkUEPZvWMzuOSIWoloelPvjgA+655x4++ugjevfu/ZvLWq1WrFZrFSUTEak+vttzghU7j+PjZeHpfjr1WzxHtdtz8/777zN8+HDef/99+vXrZ3YcEZFqqcTh5OX/njv1e0i3Rlxer5bJiUQqjql7bvLy8ti7d6/r8YEDB0hLSyM8PJyGDRsyZswYjhw5wrx584Bzh6KSkpJ4/fXXSUxMJDMzE4CAgABCQ0NNeQ8iItXNgex8xi/dwa6sXEIDfHnkmqZmRxKpUKbuudm4cSMdOnRwncadnJxMhw4dGDt2LADHjh0jPT3dtfzbb79NSUkJI0eOpH79+q7pkUceMSW/iEh1cjq/iOeX/MS1U1bx9Y4svCww9sZWhAX6mR1NpEJZDMMwzA5RlWw2G6GhoeTk5BASEmJ2HBGRSmcvcTDv+0P885s92ApLAPhT83o8fUNLmkUGm5xOpGwu5fu7Wg4oFhGR32cYBsu2ZjJp+Q4yTp0FoEVUMM/0a6kzo8SjqdyIiHig1EOnGb90O5vSzwAQEWzlieuac2unBnh7WcwNJ1LJVG5ERDxIxqkCJi3fydIfjwEQ4OvNfVdexn1XXkaQVb/ypWbQ33QREQ+Qc7aY6d/uZc7agxQ5nFgscHunBjx+XXMiQ/zNjidSpVRuRESqsWKHk/k/HOL1FXs4U1AMwBVN6vL0DS1pFa2TJqRmUrkREamGDMPgy+1ZTPp8Jwey8wFoElGLZ25oyZ+a18Ni0bgaqblUbkREqpkfD5/h5aU7WH/gFAB1gvx47Npm/KVLrG58KYLKjYhItXH0zFle/WIXizcfAcDq48U9PRvzwFWXE+zva3I6EfehciMi4uZyC4t5a+U+Zq45gL3ECcDNHWJ4ok9zYsICTE4n4n5UbkRE3FSJw8kHGzKY+vVusvOKAEhoHM6z/VrSrkGYueFE3JjKjYiImzEMg293HWfCsp3sPZ4HwGV1gxjdtwXXtorUYGGR36FyIyLiRrYftTF+2XbW7j0JQO3Ac3ftHty1Eb4aLCxSJio3IiJuIMtWyN+/2MXHmw5jGODn7cXwHnE82KsJoQEaLCxyKVRuRERMVFBUwr9W7eft1fs5W+wA4MZ29Xnq+hbEhgeanE6kelK5ERExgcNp8O/Uw/z9y10cz7UD0LFhGM/e2IqODWubnE6kelO5ERGpYt/tOcH4pTvYmZkLQGx4AKOvb8kNbaM0WFikAqjciIhUkd1ZuUxYtoOVu04AEOLvw8PXNGVIt0ZYfbxNTifiOVRuREQq2YlcO1O+2s2iDek4DfDxsjCkWyMevroptYP8zI4n4nFUbkREKklhsYN3v9vPWyv3kV90brDw9a2jeKpvCxrXDTI5nYjnUrkREalgTqfBp2lHePWLXRzLKQQgvkEoz/RrRULjcJPTiXg+lRsRkQr0w/6TjF+6g61HcgCICQvgb9c3p3+7aLy8NFhYpCqo3IiIVIB9J/KYuGwnX+/IAiDY6sODvZowvEcc/r4aLCxSlVRuRET+gFP5Rbz+9W4WrEunxGng7WVhUEJDHu3dlDq1rGbHE6mRylVuMjIysFgsNGjQAID169ezcOFCWrVqxX333VehAUVE3FFhsYO53x9k2rd7yS0sAaB3ywhG921Bk4hgk9OJ1GzlKjeDBg3ivvvuY8iQIWRmZnLttdfSunVrFixYQGZmJmPHjq3onCIipjMMg5yzxazek80ry3dy+PRZAFrVD+HZfi3p3qSuyQlFBMpZbrZt20ZCQgIAH374IW3atGHt2rV8+eWXPPDAAyo3IlLtFBY7yLIVkplTSFaunaycwnOPbYUct9nJtJ17bC9xutaJCvHniT7NuaVDjAYLi7iRcpWb4uJirNZzx5K//vprbrrpJgBatGjBsWPHKi6diMgf5HAaZOfZL1pcsmyFZNns5JwtLvM26wVbGdK1Eff2vIwAPw0WFnE35So3rVu3ZsaMGfTr14+vvvqKl156CYCjR49Sp06dCg0oInIhhmFgKyz5uZycKy7Hc+1k5vyyt+Xcnydy7TiNsm3T39eLqBB/IkL8iQrxJzLESmSIP1Gh/uf+DPGnXrBVZz+JuLlylZvJkydz88038+qrr5KUlER8fDwAS5YscR2uEhEpr8JiB8dtdrJyf97b8j97WP63uBQWO39/Y4C3l4V6tayusvJLYYkItrqKS2SIPyH+PrpxpYgHsBiGUcb/pynN4XBgs9moXbu2a97BgwcJDAwkIiKiwgJWNJvNRmhoKDk5OYSEhJgdR6RGcTgNTubbycqx/894ll8OD9ldJeZ0QdkPEYUG+J7byxLqT+TPZeV/97xEhfhTp5YVb42JEanWLuX7u1x7bs6ePYthGK5ic+jQIRYvXkzLli3p06dPeTYpIh7M6TSY/MVOZq85SJGjbHtbrD5e5/aqBF+8uESG+OsQkYicp1zl5s9//jO33HILDzzwAGfOnCExMRFfX1+ys7OZMmUKf/3rXys6p4hUU/YSB8kfbmHpj+dONvCyQN1aPxeVYH+iQq2/Gudy7s+QAB0iEpHyKVe52bRpE//4xz8A+Pjjj4mMjGTz5s38+9//ZuzYsSo3IgKArbCY++elkrL/JL7eFl69LZ4b29XHx9vL7Ggi4sHK9RumoKCA4OBzV+D88ssvueWWW/Dy8qJr164cOnSozNtZvXo1/fv3Jzo6GovFwqeffvq766xcuZKOHTtitVpp0qQJc+bMKc9bEJFKlmUr5I4ZKaTsP0ktqw9zhicwoEOMio2IVLpy/ZZp0qQJn376KRkZGXzxxRdcd911ABw/fvySBunm5+cTHx/P9OnTy7T8gQMH6NevH7169SItLY1HH32Ue+65hy+++KI8b0NEKsm+E3nc8ub37MzMpW4tKx/c15UeunqviFSRch2WGjt2LIMGDeKxxx7j6quvplu3bsC5vTgdOnQo83b69u1L3759y7z8jBkzaNy4Ma+99hoALVu2ZM2aNfzjH//QQGYRN5F66DR3z93AmYJiGtcNYt6IBGLDA82OJSI1SLnKzW233cYVV1zBsWPHXNe4Abjmmmu4+eabKyzcr6WkpNC7d+9S8/r06cOjjz5aaa8pImW3YkcWIxduorDYSXxsGLOSOuvO2CJS5cpVbgCioqKIiori8OHDADRo0KDSL+CXmZlJZGRkqXmRkZHYbDbOnj1LQEDAeevY7Xbsdrvrsc1mq9SMIjXVog3pPL14Gw6nQa/m9Zg+uCOBfuX+FSMiUm7lGnPjdDp58cUXCQ0NpVGjRjRq1IiwsDBeeuklnM6yXcOiqkycOJHQ0FDXFBsba3YkEY9iGAZvrNjDU//eisNpcFunBrw9tLOKjYiYply/fZ555hlmzpzJpEmT6NGjBwBr1qzh+eefp7CwkPHjx1doyF9ERUWRlZVVal5WVhYhISEX3GsDMGbMGJKTk12PbTabCo5IBXE4DcZ+to0F69IBGNWrCY9f10zXpxERU5Wr3MydO5d3333XdTdwgHbt2hETE8ODDz5YaeWmW7duLFu2rNS8r776yjWg+UKsVqvrDuYiUnEKix08/P5mvtyehcUCL97UmiHd4syOJSJSvsNSp06dokWLFufNb9GiBadOnSrzdvLy8khLSyMtLQ04d6p3Wloa6enn/i9wzJgxDB061LX8Aw88wP79+/nb3/7Gzp07efPNN/nwww957LHHyvM2RKSczhQUcde76/hyexZ+Pl68Oaijio2IuI1ylZv4+HimTZt23vxp06bRrl27Mm9n48aNdOjQwXX6eHJyMh06dGDs2LEAHDt2zFV0ABo3bszSpUv56quviI+P57XXXuPdd9/VaeAiVejombPcPiOFjYdOE+zvw3sjEujbtr7ZsUREXMp1V/BVq1bRr18/GjZs6DoklJKSQkZGBsuWLaNnz54VHrSi6K7gIuW3KzOXpFnrybQVEhXiz9wRCTSPCjY7lojUAJfy/V2uPTdXXXUVu3fv5uabb+bMmTOcOXOGW265hZ9++on33nuvXKFFxL2t23+S22d8T6atkCYRtfjkwe4qNiLilsq15+ZitmzZQseOHXE4HBW1yQqnPTcil275tmM8/EEaRSVOOjeqzbtJnQkL9DM7lojUIJfy/a0LUYjIb3ov5SBjl/yEYcB1rSJ5484O+Pt6mx1LROSiVG5E5IIMw+C1L3cz7du9AAxKbMhLf26Dt5euYSMi7k3lRkTOU+xw8szirXy48dztVZKvbcZDVzfRxflEpFq4pHJzyy23/ObzZ86c+SNZRMQNFBSVMHLBJr7ddQIvC0y4uS1/SWhodiwRkTK7pHITGhr6u8//70X3RKR6OZVfxPA5G9iScQZ/Xy+m3dmR3q0if39FERE3cknlZvbs2ZWVQ0RMlnGqgKRZ69mfnU9YoC8zk7rQqVFts2OJiFwyjbkREbYdyWH4nA2cyLUTExbA3BEJNImoZXYsEZFyUbkRqeHW7s3m/vdSybOX0CIqmLkjEogM8Tc7lohIuanciNRgS7Yc5fEP0yh2GHS9LJy3h3YmxN/X7FgiIn+Iyo1IDfXud/t5eekOAPq1rc+UgfFYfXRxPhGp/lRuRGoYp9Ng0vKdvL16PwDDuscx9sZWeOnifCLiIVRuRGqQohInf/t4C5+mHQVgdN8W3H/lZbo4n4h4FJUbkRoiz17CX+en8t2ebHy8LEy+tR23dmpgdiwRkQqnciNSA5zItTN8znq2HbER6OfNm4M78qfmEWbHEhGpFCo3Ih7uQHY+SbPWk36qgDpBfswa1oX42DCzY4mIVBqVGxEPtiXjDCPmbOBkfhENwwOZNyKBuLpBZscSEalUKjciHmrlruP8df4mzhY7aBMTwuxhCdQLtpodS0Sk0qnciHigf6ce5ql//0iJ06Bn07q8dVcnaln1z11Eagb9thPxIIZh8NaqfbyyfBcAA9pH88pt8fj5eJmcTESk6qjciHgIh9Pgpf9uZ873BwG4/8rLeOr6Fro4n4jUOCo3Ih6gsNjB4x9uYenWYwA8d2Mr7r6iscmpRETMoXIjUs3lnC3mvnkbWXfgFL7eFl67oz03xUebHUtExDQqNyLVWGZOIcNmr2dnZi61rD68PaQT3ZvUNTuWiIipVG5Eqqm9x3NJmrWBI2fOUi/YypzhXWgdHWp2LBER06nciFRDqYdOcffcjZwpKOayukHMHZFAbHig2bFERNyCyo1INfPV9ixGLdyEvcRJ+9gwZg3rQniQn9mxRETchsqNSDXy/vp0nlm8FacBvZrXY/rgjgT66Z+xiMj/0m9FkWrAMAxeX7GHqV/vAeCOzg2YcHNbfLx1cT4RkV9TuRFxcyUOJ8999hPvr08H4KGrm5B8bTMsFl2cT0TkQlRuRNxYYbGDh9/fzJfbs7BY4MWbWjOkW5zZsURE3JrKjYibyreXcN97G1m79yR+Pl688Zf2XN+mvtmxRETcnsqNiBs6U1DEsNkbSMs4Q5CfN+8kdab75bo4n4hIWbjFaMTp06cTFxeHv78/iYmJrF+//jeXnzp1Ks2bNycgIIDY2Fgee+wxCgsLqyitSOU6bitk4L9+IC3jDGGBviy4t6uKjYjIJTC93CxatIjk5GTGjRvHpk2biI+Pp0+fPhw/fvyCyy9cuJDRo0czbtw4duzYwcyZM1m0aBFPP/10FScXqXgZpwq4/V8p7MrKJSLYyqL7utE+NszsWCIi1Yrp5WbKlCnce++9DB8+nFatWjFjxgwCAwOZNWvWBZf//vvv6dGjB4MGDSIuLo7rrruOO++883f39oi4uz1Zudw243sOnSwgNjyAjx/oTvOoYLNjiYhUO6aWm6KiIlJTU+ndu7drnpeXF7179yYlJeWC63Tv3p3U1FRXmdm/fz/Lli3jhhtuqJLMIpXhx8NnuONfKWTZ7DSNqMXHD3SnYR3dTkFEpDxMHVCcnZ2Nw+EgMjKy1PzIyEh27tx5wXUGDRpEdnY2V1xxBYZhUFJSwgMPPHDRw1J2ux273e56bLPZKu4NiFSAH/af5J65G8mzlxDfIJQ5wxOordspiIiUm+mHpS7VypUrmTBhAm+++SabNm3ik08+YenSpbz00ksXXH7ixImEhoa6ptjY2CpOLHJxK3ZkkTRrPXn2ErpdVocF93ZVsRER+YMshmEYZr14UVERgYGBfPzxxwwYMMA1PykpiTNnzvDZZ5+dt07Pnj3p2rUrr776qmve/Pnzue+++8jLy8PLq3Rfu9Cem9jYWHJycggJCan4NyVSRp+lHeHxD7dQ4jTo3TKCaYM64u/rbXYsERG3ZLPZCA0NLdP3t6l7bvz8/OjUqRMrVqxwzXM6naxYsYJu3bpdcJ2CgoLzCoy397kvhAv1NKvVSkhISKlJxGzzfzjEo4vSKHEaDGgfzVt3dVKxERGpIKZfxC85OZmkpCQ6d+5MQkICU6dOJT8/n+HDhwMwdOhQYmJimDhxIgD9+/dnypQpdOjQgcTERPbu3ctzzz1H//79XSVHxJ29uXIvryzfBcCQro144abWeHnpPlEiIhXF9HIzcOBATpw4wdixY8nMzKR9+/YsX77cNcg4PT291J6aZ599FovFwrPPPsuRI0eoV68e/fv3Z/z48Wa9BZEyMQyDyct3MWPVPgBG9rqcJ65rrhtgiohUMFPH3JjhUo7ZiVQUh9Pguc+2sXDduTt7j+nbgvuvutzkVCIi1celfH+bvudGxNMVO5wkf7iF/2w5isUCE25uy50JDc2OJSLisVRuRCpRYbGDBxds4pudx/HxsvCPge3pHx9tdiwREY+mciNSSXILi7l77kbWHziFv68Xb93ViV7NI8yOJSLi8VRuRCrBqfwikmatZ+uRHIKtPswc1oWExuFmxxIRqRFUbkQq2LGcswyZuZ69x/MID/Jj3ogE2sSEmh1LRKTGULkRqUAHs/MZ/O46jpw5S/1Qf967O5EmEbXMjiUiUqOo3IhUkB3HbAyZuZ7sPDtxdQKZf08iDWrrzt4iIlVN5UakAmxKP82wWeuxFZbQIiqY9+5OpF6w1exYIiI1ksqNyB+0Zk829723kYIiBx0bhjF7WAKhgb5mxxIRqbFUbkT+gOXbMnn4/c0UOZz0bFqXfw3pRKCf/lmJiJhJv4VFyunj1MP87eMtOA24vnUUr9/ZHquPbt4qImI2lRuRcpi99gAv/Gc7ALd1asCkW9ri4+31O2uJiEhVULkRuQSGYfDGir384+vdAIzo0Zhn+7XEy0t39hYRcRcqNyJlZBgGLy/dwcw1BwB4rHczHr6mCRaLio2IiDtRuREpgxKHkzGfbOWj1MMAjL2xFSOuaGxyKhERuRCVG5HfYS9x8OgHaXy+LRMvC7xyWzy3dWpgdiwREbkIlRuR31BQVML976Xy3Z5s/Ly9eOPO9lzfpr7ZsURE5Deo3IhcRM7ZYkbM2UDqodME+HrzztDOXNG0rtmxRETkd6jciFzAiVw7Q2etZ8cxGyH+PswenkCnRrXNjiUiImWgciPyK4dPFzBk5noOZOdTt5aV9+5OoGX9ELNjiYhIGanciPyPvcfzGDJzHcdyCokJC2D+PYk0rhtkdiwREbkEKjciP9t2JIekWes5mV/E5fWCmH9PIvVDA8yOJSIil0jlRgRYf+AUd8/ZQK69hDYxIcwdnkCdWlazY4mISDmo3EiNt3LXcR6Yn0phsZOEuHDeHdaZEH9fs2OJiEg5qdxIjfbfH4/y2KI0ih0GvZrX483BnQjw0529RUSqM5UbqbE+WJ/OmMVbMQy4sV19ptzRHj8f3dlbRKS6U7mRGunt1fuYsGwnAHcmNOTlAW3w1p29RUQ8gsqN1CiGYfDal7uZ9u1eAO6/6jJGX99Cd/YWEfEgKjdSYzidBs//5yfmpRwC4G/XN+fBPzUxOZWIiFQ0lRupEYodTv728Y8s3nwEiwVe/HMbhnRtZHYsERGpBCo34vEKix2MWriZr3dk4e1lYcod8fy5fYzZsUREpJKo3IhHy7OXcO/cjaTsP4nVx4s3B3fkmpaRZscSEZFKpHIjHut0fhHD5mxgS8YZgvy8eTepC90ur2N2LBERqWQqN+JxjuWcZeG6dN5fn052XhG1A32ZOyKBdg3CzI4mIiJVwC2uWDZ9+nTi4uLw9/cnMTGR9evX/+byZ86cYeTIkdSvXx+r1UqzZs1YtmxZFaUVd2QYBj/sP8mDC1K5YvK3/PObvWTnFREbHsCH93dTsRERqUFM33OzaNEikpOTmTFjBomJiUydOpU+ffqwa9cuIiIizlu+qKiIa6+9loiICD7++GNiYmI4dOgQYWFhVR9eTJdvL+HTtCPM+/4Qu7JyXfO7XhZOUrc4rm0ViY+3W3R4ERGpIhbDMAwzAyQmJtKlSxemTZsGgNPpJDY2loceeojRo0eft/yMGTN49dVX2blzJ76+l35zQ5vNRmhoKDk5OYSEhPzh/GKOA9n5vJdyiI9SM8gtLAEgwNebmzvGMLRbI1pE6bMVEfEkl/L9beqem6KiIlJTUxkzZoxrnpeXF7179yYlJeWC6yxZsoRu3boxcuRIPvvsM+rVq8egQYN46qmn8PY+/4aHdrsdu93uemyz2Sr+jUiVcDoNVu0+wZzvD7Jq9wnX/Lg6gQzpFsdtnRoQGqC7eYuI1HSmlpvs7GwcDgeRkaVPzY2MjGTnzp0XXGf//v188803DB48mGXLlrF3714efPBBiouLGTdu3HnLT5w4kRdeeKFS8kvVyCko5qPUDOalHCL9VAEAFgv0ah7B0G6NuLJpPbx0XygREfmZ6WNuLpXT6SQiIoK3334bb29vOnXqxJEjR3j11VcvWG7GjBlDcnKy67HNZiM2NrYqI0s57ThmY17KQRZvPkJhsROAEH8f7ugcy11dGxFXN8jkhCIi4o5MLTd169bF29ubrKysUvOzsrKIioq64Dr169fH19e31CGoli1bkpmZSVFREX5+fqWWt1qtWK3Wig8vlaLY4eSLnzKZ9/0h1h885ZrfIiqYpO5x/Ll9NIF+1a6Ti4hIFTL1W8LPz49OnTqxYsUKBgwYAJzbM7NixQpGjRp1wXV69OjBwoULcTqdeHmdOwtm9+7d1K9f/7xiI9XH8dxC3l+XwcL1h8iynRsj5e1l4fo2USR1i6NLXG3duVtERMrE9P8FTk5OJikpic6dO5OQkMDUqVPJz89n+PDhAAwdOpSYmBgmTpwIwF//+lemTZvGI488wkMPPcSePXuYMGECDz/8sJlvQ8rBMAw2pZ9hXspBlm09RrHj3Il7dWtZGZTYkEEJDYkK9Tc5pYiIVDeml5uBAwdy4sQJxo4dS2ZmJu3bt2f58uWuQcbp6emuPTQAsbGxfPHFFzz22GO0a9eOmJgYHnnkEZ566imz3oJcosJiB0u2HGVeykG2Hfn/s9c6NgwjqXscfdvUx89H16YREZHyMf06N1VN17kxT8apAuavO8SiDRmcKSgGwM/Hiz/HRzO0WxxtG4SanFBERNxVtbnOjXg+wzBYu/ckc1MOsmJHFs6fq3RMWABDujXijs6xhAdprJSIiFQclRupFLmFxXyy6QhzUw6y/0S+a/4VTeoytFsjrmkZibeuTSMiIpVA5UYq1N7jucxLOcS/Uw+TX+QAIMjPm9s6NWBIt0Y0iQg2OaGIiHg6lRv5wxxOg693ZDEv5SBr9550zb+8XhBJ3eO4uUMMwf66LYKIiFQNlRspt1P5RSzakMH8Hw5x5MxZALws0LtlJEnd4+h+eR1dm0ZERKqcyo1csq2Hc5ibcpAlW45SVHLutgi1A30Z2KUhd3VtSIPagSYnFBGRmkzlRsrEXuLg862ZzE05yOb0M675bWJCSOoWR//4aPx9z78ru4iISFVTuZHfdCznLAvXpfP++nSy84oA8PW20K9tfYZ2j6NDbJgOPYmIiFtRuZHzGIbB+gOnmJdyiOU/ZeL4+eI0kSFWBic24s6EhtQL1s1IRUTEPancSClnixzcPXcD3+/7/7OeEhqHk9QtjutaR+LrrdsiiIiIe1O5ERen0yD5wzS+33cSf18vbu7QgKHdGtGyvm5TISIi1YfKjbi8+uUuPt+Wia+3hXkjEkloHG52JBERkUumYwwCwIcbMnhr5T4AJt/aTsVGRESqLZUb4ft92Ty9eCsAD1/dhFs6NjA5kYiISPmp3NRw+07k8cB7qZQ4DfrHR/PYtc3MjiQiIvKHqNzUYKfyixgxZwO2whI6Ngzj1dva6Zo1IiJS7anc1FD2Egf3v7eRQycLaFA7gLeHdtYVhkVExCOo3NRAhmEw+t9b2XDwNMFWH2YP60LdWroon4iIeAaVmxron9/sZfHmI3h7WXjzro40jQw2O5KIiEiFUbmpYT5LO8KUr3YD8NKf29CzaT2TE4mIiFQslZsaJPXQKZ78+EcA7u3ZmEGJDU1OJCIiUvFUbmqI9JMF3DcvlaISJ9e2imR035ZmRxIREakUKjc1QM7ZYkbM3cDJ/CJaR4fw+l/a4+2lU75FRMQzqdx4uGKHk5ELNrH3eB5RIf7MTOpCoJ9uKSYiIp5L5caDGYbB2M+2sWZvNoF+3ryb1JmoUH+zY4mIiFQqlRsP9u53B3h/fQYWC7zxlw60iQk1O5KIiEilU7nxUF/8lMmEz3cA8Gy/VvRuFWlyIhERkaqhcuOBth7O4dEP0jAMuKtrQ0b0iDM7koiISJVRufEwx3LOcvfcDZwtdtCzaV2e799aN8MUEZEaReXGg+TbSxgxZyPHc+00i6zF9MEd8fHWRywiIjWLvvk8hMNp8PD7m9lxzEbdWn7MTOpCiL+v2bFERESqnMqNhxi/dAcrdh7H6uPF20M7ExseaHYkERERU6jceID3Ug4ya+0BAF67I56ODWubnEhERMQ8KjfV3Mpdx3n+P9sBeLJPc25sF21yIhEREXO5RbmZPn06cXFx+Pv7k5iYyPr168u03gcffIDFYmHAgAGVG9BN7crMZdTCzTicBrd2bMCDf7rc7EgiIiKmM73cLFq0iOTkZMaNG8emTZuIj4+nT58+HD9+/DfXO3jwIE888QQ9e/asoqTu5XhuISPmbCDPXkJi43Am3tJWp3yLiIjgBuVmypQp3HvvvQwfPpxWrVoxY8YMAgMDmTVr1kXXcTgcDB48mBdeeIHLLrusCtO6h8JiB/fNS+XImbM0rhvEjLs64edj+kcpIiLiFkz9RiwqKiI1NZXevXu75nl5edG7d29SUlIuut6LL75IREQEd9999+++ht1ux2azlZqqM6fT4PEPt5CWcYawQF9mDetC7SA/s2OJiIi4DVPLTXZ2Ng6Hg8jI0vc9ioyMJDMz84LrrFmzhpkzZ/LOO++U6TUmTpxIaGioa4qNjf3Duc005avdLN16DF9vCzPu6kTjukFmRxIREXEr1epYRm5uLkOGDOGdd96hbt26ZVpnzJgx5OTkuKaMjIxKTll5Pk49zLRv9wIw8ZZ2dL2sjsmJRERE3I+PmS9et25dvL29ycrKKjU/KyuLqKio85bft28fBw8epH///q55TqcTAB8fH3bt2sXll5c+Y8hqtWK1WishfdX6Yf9JxnzyIwCjejXhtk4NTE4kIiLinkzdc+Pn50enTp1YsWKFa57T6WTFihV069btvOVbtGjB1q1bSUtLc0033XQTvXr1Ii0trdofcrqY/SfyuP+9VIodBv3a1Sf52mZmRxIREXFbpu65AUhOTiYpKYnOnTuTkJDA1KlTyc/PZ/jw4QAMHTqUmJgYJk6ciL+/P23atCm1flhYGMB58z3F6fwi7p67kZyzxbSPDeO12+Px8tIp3yIiIhdjerkZOHAgJ06cYOzYsWRmZtK+fXuWL1/uGmScnp6Ol1e1GhpUYYpKnNw/P5UD2fnEhAXwztDO+Pt6mx1LRETErVkMwzDMDlGVbDYboaGh5OTkEBISYnacizIMgyc++pF/bzpMsNWHfz/YnWaRwWbHEhERMcWlfH/XzF0i1cCbK/fx702H8fayMG1wRxUbERGRMlK5cUP//fEor36xC4Dnb2rNVc3qmZxIRESk+lC5cTOb0k+T/OEWAO6+ojFDujYyOZGIiEj1onLjRjJOFXDfvI0UlTjp3TKCp29oaXYkERGRakflxk3YCou5e+4GsvOKaFU/hNf/0gFvnfItIiJyyVRu3ECJw8nIBZvYnZVHZIiVmcM6E2Q1/Sx9ERGRaknlxmSGYTBuyU98tyebAF9vZiZ1oX5ogNmxREREqi2VG5PNWnuQBevSsVjg9b+0p01MqNmRREREqjWVGxN9vT2Ll5duB+CZG1pyXevzbxYqIiIil0blxiTbjuTw8AebMQwYlNiQu69obHYkERERj6ByY4LMnELumbuRgiIHPZvW5YWbWmOx6MwoERGRiqByU8Xy7SXcPXcDmbZCmkbUYtqgjvh662MQERGpKPpWrUIOp8EjH6Tx01EbdYL8mDWsC6EBvmbHEhER8SgqN1Vo0uc7+HpHFn4+Xrw9tDOx4YFmRxIREfE4KjdVZMG6Q7zz3QEAXrs9nk6NapucSERExDOp3FSB7/acYOxnPwHw+LXN6B8fbXIiERERz6VyU8n2ZOXy4PxNOJwGt3SIYdTVTcyOJCIi4tFUbipRdp6d4XM2kGsvoUtcbSbe2lanfIuIiFQylZtKUljs4N55Gzl8+iyN6gTyryGdsfp4mx1LRETE46ncVAKn0+CJj7awOf0MIf4+zBrWhfAgP7NjiYiI1AgqN5Vg6te7+e+Px/DxsjBjSCcur1fL7EgiIiI1hspNBft36mHe+GYvABNuaUv3y+uanEhERKRmUbmpQOv2n2T0Jz8C8Nc/Xc4dnWNNTiQiIlLzqNxUkIPZ+dw/P5Vih0HfNlE8eV1zsyOJiIjUSD5mB/AUFguEB/nRKDyQKXe0x8tLp3yLiIiYQeWmgjSqE8Tiv/agyOEkwE+nfIuIiJhF5aYChQbqDt8iIiJm05gbERER8SgqNyIiIuJRVG5ERETEo6jciIiIiEdRuRERERGPonIjIiIiHkXlRkRERDyKW5Sb6dOnExcXh7+/P4mJiaxfv/6iy77zzjv07NmT2rVrU7t2bXr37v2by4uIiEjNYnq5WbRoEcnJyYwbN45NmzYRHx9Pnz59OH78+AWXX7lyJXfeeSfffvstKSkpxMbGct1113HkyJEqTi4iIiLuyGIYhmFmgMTERLp06cK0adMAcDqdxMbG8tBDDzF69OjfXd/hcFC7dm2mTZvG0KFDf3d5m81GaGgoOTk5hISE/OH8IiIiUvku5fvb1D03RUVFpKam0rt3b9c8Ly8vevfuTUpKSpm2UVBQQHFxMeHh4Rd83m63Y7PZSk0iIiLiuUwtN9nZ2TgcDiIjI0vNj4yMJDMzs0zbeOqpp4iOji5VkP7XxIkTCQ0NdU2xsbF/OLeIiIi4L9PH3PwRkyZN4oMPPmDx4sX4+/tfcJkxY8aQk5PjmjIyMqo4pYiIiFQlU+8KXrduXby9vcnKyio1Pysri6ioqN9c9+9//zuTJk3i66+/pl27dhddzmq1YrVaXY9/GWKkw1MiIiLVxy/f22UaKmyYLCEhwRg1apTrscPhMGJiYoyJEydedJ3JkycbISEhRkpKyiW/XkZGhgFo0qRJkyZNmqrhlJGR8bvf9abuuQFITk4mKSmJzp07k5CQwNSpU8nPz2f48OEADB06lJiYGCZOnAjA5MmTGTt2LAsXLiQuLs41NqdWrVrUqlXrd18vOjqajIwMgoODsVgslffGqjGbzUZsbCwZGRk6o8wN6PNwL/o83I8+E/dSWZ+HYRjk5uYSHR39u8uaXm4GDhzIiRMnGDt2LJmZmbRv357ly5e7Bhmnp6fj5fX/Q4PeeustioqKuO2220ptZ9y4cTz//PO/+3peXl40aNCgQt+DpwoJCdEvCjeiz8O96PNwP/pM3EtlfB6hoaFlWs7069yI+9G1gNyLPg/3os/D/egzcS/u8HlU67OlRERERH5N5UbOY7VaGTduXKmzzMQ8+jzciz4P96PPxL24w+ehw1IiIiLiUbTnRkRERDyKyo2IiIh4FJUbERER8SgqN+IyceJEunTpQnBwMBEREQwYMIBdu3aZHUt+NmnSJCwWC48++qjZUWqsI0eOcNddd1GnTh0CAgJo27YtGzduNDtWjeRwOHjuuedo3LgxAQEBXH755bz00ktluzS/VIjVq1fTv39/oqOjsVgsfPrpp6WeNwyDsWPHUr9+fQICAujduzd79uypkmwqN+KyatUqRo4cyQ8//MBXX31FcXEx1113Hfn5+WZHq/E2bNjAv/71r9+8j5pUrtOnT9OjRw98fX35/PPP2b59O6+99hq1a9c2O1qNNHnyZN566y2mTZvGjh07mDx5Mq+88gr//Oc/zY5WY+Tn5xMfH8/06dMv+Pwrr7zCG2+8wYwZM1i3bh1BQUH06dOHwsLCSs+ms6Xkok6cOEFERASrVq3iyiuvNDtOjZWXl0fHjh158803efnll2nfvj1Tp041O1aNM3r0aNauXct3331ndhQBbrzxRiIjI5k5c6Zr3q233kpAQADz5883MVnNZLFYWLx4MQMGDADO7bWJjo7m8ccf54knngAgJyeHyMhI5syZw1/+8pdKzaM9N3JROTk5AISHh5ucpGYbOXIk/fr1o3fv3mZHqdGWLFlC586duf3224mIiKBDhw688847Zseqsbp3786KFSvYvXs3AFu2bGHNmjX07dvX5GQCcODAATIzM0v93goNDSUxMZGUlJRKf33T7y0l7snpdPLoo4/So0cP2rRpY3acGuuDDz5g06ZNbNiwwewoNd7+/ft56623SE5O5umnn2bDhg08/PDD+Pn5kZSUZHa8Gmf06NHYbDZatGiBt7c3DoeD8ePHM3jwYLOjCbhuav3LfSJ/ERkZ6XquMqncyAWNHDmSbdu2sWbNGrOj1FgZGRk88sgjfPXVV/j7+5sdp8ZzOp107tyZCRMmANChQwe2bdvGjBkzVG5M8OGHH7JgwQIWLlxI69atSUtL49FHHyU6Olqfh+iwlJxv1KhR/Pe//+Xbb7/VHdRNlJqayvHjx+nYsSM+Pj74+PiwatUq3njjDXx8fHA4HGZHrFHq169Pq1atSs1r2bIl6enpJiWq2Z588klGjx7NX/7yF9q2bcuQIUN47LHHmDhxotnRBIiKigIgKyur1PysrCzXc5VJ5UZcDMNg1KhRLF68mG+++YbGjRubHalGu+aaa9i6dStpaWmuqXPnzgwePJi0tDS8vb3Njlij9OjR47xLI+zevZtGjRqZlKhmKygowMur9FeYt7c3TqfTpETyvxo3bkxUVBQrVqxwzbPZbKxbt45u3bpV+uvrsJS4jBw5koULF/LZZ58RHBzsOi4aGhpKQECAyelqnuDg4PPGOwUFBVGnTh2NgzLBY489Rvfu3ZkwYQJ33HEH69ev5+233+btt982O1qN1L9/f8aPH0/Dhg1p3bo1mzdvZsqUKYwYMcLsaDVGXl4ee/fudT0+cOAAaWlphIeH07BhQx599FFefvllmjZtSuPGjXnuueeIjo52nVFVqQyRnwEXnGbPnm12NPnZVVddZTzyyCNmx6ix/vOf/xht2rQxrFar0aJFC+Ptt982O1KNZbPZjEceecRo2LCh4e/vb1x22WXGM888Y9jtdrOj1RjffvvtBb8zkpKSDMMwDKfTaTz33HNGZGSkYbVajWuuucbYtWtXlWTTdW5ERETEo2jMjYiIiHgUlRsRERHxKCo3IiIi4lFUbkRERMSjqNyIiIiIR1G5EREREY+iciMiIiIeReVGREREPIrKjYhUmIMHD2KxWEhLSzM7isvOnTvp2rUr/v7+tG/f/pLXd8f3JCK/TeVGxIMMGzYMi8XCpEmTSs3/9NNPsVgsJqUy17hx4wgKCmLXrl2lbuJnljlz5hAWFmZ2DBGPpnIj4mH8/f2ZPHkyp0+fNjtKhSkqKir3uvv27eOKK66gUaNG1KlTpwJTmcvhcOgO2CIXoXIj4mF69+5NVFQUEydOvOgyzz///HmHaKZOnUpcXJzr8bBhwxgwYAATJkwgMjKSsLAwXnzxRUpKSnjyyScJDw+nQYMGzJ49+7zt79y5k+7du+Pv70+bNm1YtWpVqee3bdtG3759qVWrFpGRkQwZMoTs7GzX83/6058YNWoUjz76KHXr1qVPnz4XfB9Op5MXX3yRBg0aYLVaad++PcuXL3c9b7FYSE1N5cUXX8RisfD8889fdDuvvPIKTZo0wWq10rBhQ8aPH3/BZS+05+XXe8a2bNlCr169CA4OJiQkhE6dOrFx40ZWrlzJ8OHDycnJwWKxlMpkt9t54okniImJISgoiMTERFauXHne6y5ZsoRWrVphtVpJT09n5cqVJCQkEBQURFhYGD169ODQoUMXzC5SU6jciHgYb29vJkyYwD//+U8OHz78h7b1zTffcPToUVavXs2UKVMYN24cN954I7Vr12bdunU88MAD3H///ee9zpNPPsnjjz/O5s2b6datG/379+fkyZMAnDlzhquvvpoOHTqwceNGli9fTlZWFnfccUepbcydOxc/Pz/Wrl3LjBkzLpjv9ddf57XXXuPvf/87P/74I3369OGmm25iz549ABw7dozWrVvz+OOPc+zYMZ544okLbmfMmDFMmjSJ5557ju3bt7Nw4UIiIyPL/XMbPHgwDRo0YMOGDaSmpjJ69Gh8fX3p3r07U6dOJSQkhGPHjpXKNGrUKFJSUvjggw/48ccfuf3227n++utd7wWgoKCAyZMn8+677/LTTz8RHh7OgAEDuOqqq/jxxx9JSUnhvvvuq7GHIEVcquTe4yJSJZKSkow///nPhmEYRteuXY0RI0YYhmEYixcvNv73n/u4ceOM+Pj4Uuv+4x//MBo1alRqW40aNTIcDodrXvPmzY2ePXu6HpeUlBhBQUHG+++/bxiGYRw4cMAAjEmTJrmWKS4uNho0aGBMnjzZMAzDeOmll4zrrruu1GtnZGQYgLFr1y7DMAzjqquuMjp06PC77zc6OtoYP358qXldunQxHnzwQdfj+Ph4Y9y4cRfdhs1mM6xWq/HOO+9c8Plf3tPmzZsNwzCM2bNnG6GhoaWW+fXPNzg42JgzZ84Ft3eh9Q8dOmR4e3sbR44cKTX/mmuuMcaMGeNaDzDS0tJcz588edIAjJUrV170/YnURNpzI+KhJk+ezNy5c9mxY0e5t9G6dWu8vP7/10RkZCRt27Z1Pfb29qZOnTocP3681HrdunVz/bePjw+dO3d25diyZQvffvsttWrVck0tWrQAzo2P+UWnTp1+M5vNZuPo0aP06NGj1PwePXpc0nvesWMHdruda665pszr/J7k5GTuueceevfuzaRJk0q9rwvZunUrDoeDZs2alfq5rFq1qtS6fn5+tGvXzvU4PDycYcOG0adPH/r378/rr7/OsWPHKux9iFRXKjciHurKK6+kT58+jBkz5rznvLy8MAyj1Lzi4uLzlvP19S312GKxXHDepQxszcvLo3///qSlpZWa9uzZw5VXXulaLigoqMzb/CMCAgIuafmy/Oyef/55fvrpJ/r168c333xDq1atWLx48UW3mZeXh7e3N6mpqaV+Jjt27OD1118vlfXXh5xmz55NSkoK3bt3Z9GiRTRr1owffvjhkt6TiKdRuRHxYJMmTeI///kPKSkppebXq1ePzMzMUl/SFXkdl//9ci0pKSE1NZWWLVsC0LFjR3766Sfi4uJo0qRJqelSCk1ISAjR0dGsXbu21Py1a9fSqlWrMm+nadOmBAQElPk08Xr16pGbm0t+fr5r3oV+ds2aNeOxxx7jyy+/5JZbbnENvPbz88PhcJRatkOHDjgcDo4fP37ezyQqKup3M3Xo0IExY8bw/fff06ZNGxYuXFim9yLiqVRuRDxY27ZtGTx4MG+88Uap+X/60584ceIEr7zyCvv27WP69Ol8/vnnFfa606dPZ/HixezcuZORI0dy+vRpRowYAcDIkSM5deoUd955Jxs2bGDfvn188cUXDB8+/Lwv/d/z5JNPMnnyZBYtWsSuXbsYPXo0aWlpPPLII2Xehr+/P0899RR/+9vfmDdvHvv27eOHH35g5syZF1w+MTGRwMBAnn76afbt28fChQuZM2eO6/mzZ88yatQoVq5cyaFDh1i7di0bNmxwlbu4uDjy8vJYsWIF2dnZFBQU0KxZMwYPHszQoUP55JNPOHDgAOvXr2fixIksXbr0otkPHDjAmDFjSElJ4dChQ3z55Zfs2bPH9VoiNZXKjYiHe/HFF887bNSyZUvefPNNpk+fTnx8POvXr7/omUTlMWnSJCZNmkR8fDxr1qxhyZIl1K1bF8C1t8XhcHDdddfRtm1bHn30UcLCwkqN7ymLhx9+mOTkZB5//HHatm3L8uXLWbJkCU2bNr2k7Tz33HM8/vjjjB07lpYtWzJw4MDzxhH9Ijw8nPnz57Ns2TLatm3L+++/X+oUc29vb06ePMnQoUNp1qwZd9xxB3379uWFF14AoHv37jzwwAMMHDiQevXq8corrwDnDi8NHTqUxx9/nObNmzNgwAA2bNhAw4YNL5o7MDCQnTt3cuutt9KsWTPuu+8+Ro4cyf33339J71/E01iMXx88FhEREanGtOdGREREPIrKjYiIiHgUlRsRERHxKCo3IiIi4lFUbkRERMSjqNyIiIiIR1G5EREREY+iciMiIiIeReVGREREPIrKjYiIiHgUlRsRERHxKCo3IiIi4lH+D/BMd9Rqye76AAAAAElFTkSuQmCC"
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Расчет функции потерь для разного количества кластеров\n",
    "losses = []\n",
    "for num_clusters in range(1, 11):\n",
    "    kmeans = faiss.Kmeans(dims, num_clusters)\n",
    "    kmeans.train(df_base_scaled_values)\n",
    "    _, dist = idx_l2.search(kmeans.centroids, 1)\n",
    "    loss = np.sum(dist)\n",
    "    losses.append(loss)\n",
    "\n",
    "# Построение графика значения функции потерь\n",
    "plt.plot(range(1, 11), losses)\n",
    "plt.xlabel('Number of clusters')\n",
    "plt.ylabel('Loss')\n",
    "plt.show()"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "На графике видно, что с увеличением количества кластеров потери данных возрастают линейно. Поэтому можно сделать вывод, что нас интересует один единственный кластер (или максимум 2 - для увеличения скорости обучения). В данной задаче я буду использовать один кластер."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Поиск ближайших соседей и оценка точности без модели градиентного бустинга"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "outputs": [],
   "source": [
    "# Создаём словарь с индексами base\n",
    "base_index = {k: v for k, v in enumerate(df_base.index.to_list())}"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "outputs": [],
   "source": [
    "targets = df_train[\"Target\"]\n",
    "df_train.drop(\"Target\", axis=1, inplace=True)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "outputs": [
    {
     "data": {
      "text/plain": "array([[ 1.2995186 ,  1.9968884 ,  0.06377414, ...,  0.9065748 ,\n         0.99526674,  0.5229633 ],\n       [-0.06214603, -0.25715932,  0.32424858, ...,  0.70945865,\n        -0.61168975, -0.08349097],\n       [ 1.4563276 , -0.85569566, -1.851792  , ...,  0.30057615,\n        -0.7138468 ,  0.6046772 ],\n       ...,\n       [ 1.2968874 , -0.67694914,  0.3960963 , ..., -0.18653943,\n         0.219093  ,  1.0424612 ],\n       [ 0.3578639 ,  0.41189885, -0.50254726, ..., -0.22969231,\n        -0.7138468 , -0.7594065 ],\n       [-0.11325181,  0.5770834 , -0.22327293, ..., -0.70412654,\n         1.9609631 , -0.77516645]], dtype=float32)"
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train_scaled = pd.DataFrame(scaler.transform(df_train), index=df_train.index)\n",
    "train_values = np.ascontiguousarray(df_train_scaled).astype('float32')\n",
    "train_values"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: total: 9h 22min 36s\n",
      "Wall time: 38min 19s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "# Проводим поиск ближайших соседей на тренировочных данных\n",
    "vecs, idx = idx_l2.search(train_values, 20)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "72.99\n"
     ]
    }
   ],
   "source": [
    "# Оценим точность acc@20\n",
    "acc = 0\n",
    "for target, el in zip(targets.values.tolist(), idx.tolist()):\n",
    "    acc += int(target in [base_index[r] for r in el])\n",
    "\n",
    "print(100 * acc / len(idx))"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Вывод: При поиске 20 ближайших соседей - только 73% товара, подобранных экспертами попадают в их число. Если отобрать из этих 20 первые 5 и измерить точность acc5, процентаж падает ниже 50. Поэтому следует обучить модель градиентного бустинга (для нашей задачи отлично подойдёт CatBoostClassifier), которая будет из этих 20 вариантов отбирать не просто первые 5 шт, а наиболее вероятные, так точно acc@5 будет приближаться к точности как при поиске 20 ближайших соседей (73%)."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Градиентный бустинг CatBoost"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Подготовка данных для модели CatBoostClassifier"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: total: 13min 36s\n",
      "Wall time: 13min 36s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "df_distances = pd.DataFrame()\n",
    "\n",
    "# для каждого запроса из обучающего набора данных\n",
    "for i in range(df_train.shape[0]):\n",
    "    # берем расстояния от запроса до рекомендаций FAISS от первых 10 элементов.\n",
    "    temp_df = pd.concat([pd.DataFrame(idx[i][:10]), pd.DataFrame(vecs[i][:10])], axis=1)\n",
    "\n",
    "    temp_df.columns = ['idx', 'distance']\n",
    "\n",
    "    # индекс вектора-запроса для получения его координат\n",
    "    temp_df['query_idx'] = f'{i}-query'\n",
    "\n",
    "    # индекс вектора рекомендаций экспертов\n",
    "    temp_df['target_idx'] = targets[i]\n",
    "\n",
    "    df_distances = pd.concat([df_distances, temp_df], ignore_index=True)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "outputs": [
    {
     "data": {
      "text/plain": "            idx   distance    query_idx    target_idx      base_idx  target  \\\n999988  1430547   5.967613  99998-query     9252-base  1861685-base       0   \n999990  1958647   1.949338  99999-query  2769109-base  2769109-base       1   \n999992  1831967  45.285156  99999-query  2769109-base  2539368-base       0   \n999996  1138466  50.704590  99999-query  2769109-base  1412044-base       0   \n999999    48897  53.081379  99999-query  2769109-base    49440-base       0   \n\n             0_x       1_x       2_x       3_x  ...      62_y      63_y  \\\n999988  0.357864  0.411899 -0.502547  1.552935  ... -3.591553  1.338642   \n999990 -0.113252  0.577083 -0.223273  0.123277  ... -0.076096 -0.973159   \n999992 -0.113252  0.577083 -0.223273  0.123277  ... -0.152696 -0.822039   \n999996 -0.113252  0.577083 -0.223273  0.123277  ... -0.372890 -1.902764   \n999999 -0.113252  0.577083 -0.223273  0.123277  ... -0.417008  0.430435   \n\n            64_y      65_y      66_y      67_y      68_y      69_y      70_y  \\\n999988 -0.507661 -1.185903 -1.138544 -1.053340 -0.247918 -0.135820 -0.713847   \n999990  0.485453 -0.940010 -0.497467  0.398658  0.897109 -0.704312  1.960963   \n999992  0.676110 -0.900116  0.084016  0.936706  0.872203  1.067114  2.102800   \n999996 -0.010069  0.963151  0.875052  0.511212  0.674809 -1.445297  0.869343   \n999999  0.103303 -0.144794  0.068380 -0.561988  0.732958  1.572562  1.604687   \n\n            71_y  \n999988 -0.505661  \n999990 -0.775120  \n999992 -0.747574  \n999996 -0.565825  \n999999 -0.528468  \n\n[5 rows x 150 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>idx</th>\n      <th>distance</th>\n      <th>query_idx</th>\n      <th>target_idx</th>\n      <th>base_idx</th>\n      <th>target</th>\n      <th>0_x</th>\n      <th>1_x</th>\n      <th>2_x</th>\n      <th>3_x</th>\n      <th>...</th>\n      <th>62_y</th>\n      <th>63_y</th>\n      <th>64_y</th>\n      <th>65_y</th>\n      <th>66_y</th>\n      <th>67_y</th>\n      <th>68_y</th>\n      <th>69_y</th>\n      <th>70_y</th>\n      <th>71_y</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>999988</th>\n      <td>1430547</td>\n      <td>5.967613</td>\n      <td>99998-query</td>\n      <td>9252-base</td>\n      <td>1861685-base</td>\n      <td>0</td>\n      <td>0.357864</td>\n      <td>0.411899</td>\n      <td>-0.502547</td>\n      <td>1.552935</td>\n      <td>...</td>\n      <td>-3.591553</td>\n      <td>1.338642</td>\n      <td>-0.507661</td>\n      <td>-1.185903</td>\n      <td>-1.138544</td>\n      <td>-1.053340</td>\n      <td>-0.247918</td>\n      <td>-0.135820</td>\n      <td>-0.713847</td>\n      <td>-0.505661</td>\n    </tr>\n    <tr>\n      <th>999990</th>\n      <td>1958647</td>\n      <td>1.949338</td>\n      <td>99999-query</td>\n      <td>2769109-base</td>\n      <td>2769109-base</td>\n      <td>1</td>\n      <td>-0.113252</td>\n      <td>0.577083</td>\n      <td>-0.223273</td>\n      <td>0.123277</td>\n      <td>...</td>\n      <td>-0.076096</td>\n      <td>-0.973159</td>\n      <td>0.485453</td>\n      <td>-0.940010</td>\n      <td>-0.497467</td>\n      <td>0.398658</td>\n      <td>0.897109</td>\n      <td>-0.704312</td>\n      <td>1.960963</td>\n      <td>-0.775120</td>\n    </tr>\n    <tr>\n      <th>999992</th>\n      <td>1831967</td>\n      <td>45.285156</td>\n      <td>99999-query</td>\n      <td>2769109-base</td>\n      <td>2539368-base</td>\n      <td>0</td>\n      <td>-0.113252</td>\n      <td>0.577083</td>\n      <td>-0.223273</td>\n      <td>0.123277</td>\n      <td>...</td>\n      <td>-0.152696</td>\n      <td>-0.822039</td>\n      <td>0.676110</td>\n      <td>-0.900116</td>\n      <td>0.084016</td>\n      <td>0.936706</td>\n      <td>0.872203</td>\n      <td>1.067114</td>\n      <td>2.102800</td>\n      <td>-0.747574</td>\n    </tr>\n    <tr>\n      <th>999996</th>\n      <td>1138466</td>\n      <td>50.704590</td>\n      <td>99999-query</td>\n      <td>2769109-base</td>\n      <td>1412044-base</td>\n      <td>0</td>\n      <td>-0.113252</td>\n      <td>0.577083</td>\n      <td>-0.223273</td>\n      <td>0.123277</td>\n      <td>...</td>\n      <td>-0.372890</td>\n      <td>-1.902764</td>\n      <td>-0.010069</td>\n      <td>0.963151</td>\n      <td>0.875052</td>\n      <td>0.511212</td>\n      <td>0.674809</td>\n      <td>-1.445297</td>\n      <td>0.869343</td>\n      <td>-0.565825</td>\n    </tr>\n    <tr>\n      <th>999999</th>\n      <td>48897</td>\n      <td>53.081379</td>\n      <td>99999-query</td>\n      <td>2769109-base</td>\n      <td>49440-base</td>\n      <td>0</td>\n      <td>-0.113252</td>\n      <td>0.577083</td>\n      <td>-0.223273</td>\n      <td>0.123277</td>\n      <td>...</td>\n      <td>-0.417008</td>\n      <td>0.430435</td>\n      <td>0.103303</td>\n      <td>-0.144794</td>\n      <td>0.068380</td>\n      <td>-0.561988</td>\n      <td>0.732958</td>\n      <td>1.572562</td>\n      <td>1.604687</td>\n      <td>-0.528468</td>\n    </tr>\n  </tbody>\n</table>\n<p>5 rows × 150 columns</p>\n</div>"
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_distances.tail()"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "outputs": [],
   "source": [
    "# функция для получения индекса рекомендаций в базовом наборе товаров по индексу FAISS\n",
    "def get_base_idx(row):\n",
    "    return base_index[row['idx']]"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "outputs": [
    {
     "data": {
      "text/plain": "       idx   distance query_idx   target_idx      base_idx\n0   598613  19.299540   0-query  675816-base   675816-base\n1   755584  19.467600   0-query  675816-base   877519-base\n2   336969  20.747215   0-query  675816-base   361564-base\n3  1934845  23.199968   0-query  675816-base  2725256-base\n4    13374  23.545162   0-query  675816-base    13406-base",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>idx</th>\n      <th>distance</th>\n      <th>query_idx</th>\n      <th>target_idx</th>\n      <th>base_idx</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>598613</td>\n      <td>19.299540</td>\n      <td>0-query</td>\n      <td>675816-base</td>\n      <td>675816-base</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>755584</td>\n      <td>19.467600</td>\n      <td>0-query</td>\n      <td>675816-base</td>\n      <td>877519-base</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>336969</td>\n      <td>20.747215</td>\n      <td>0-query</td>\n      <td>675816-base</td>\n      <td>361564-base</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>1934845</td>\n      <td>23.199968</td>\n      <td>0-query</td>\n      <td>675816-base</td>\n      <td>2725256-base</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>13374</td>\n      <td>23.545162</td>\n      <td>0-query</td>\n      <td>675816-base</td>\n      <td>13406-base</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# по внутреннему индексу FAISS восстанавливаем индекс рекомендованного товара в базовом наборе\n",
    "df_distances['base_idx'] = df_distances.apply(get_base_idx, axis=1)\n",
    "df_distances.head()"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "outputs": [
    {
     "data": {
      "text/plain": "       idx   distance query_idx   target_idx      base_idx  target\n0   598613  19.299540   0-query  675816-base   675816-base       1\n1   755584  19.467600   0-query  675816-base   877519-base       0\n2   336969  20.747215   0-query  675816-base   361564-base       0\n3  1934845  23.199968   0-query  675816-base  2725256-base       0\n4    13374  23.545162   0-query  675816-base    13406-base       0",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>idx</th>\n      <th>distance</th>\n      <th>query_idx</th>\n      <th>target_idx</th>\n      <th>base_idx</th>\n      <th>target</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>598613</td>\n      <td>19.299540</td>\n      <td>0-query</td>\n      <td>675816-base</td>\n      <td>675816-base</td>\n      <td>1</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>755584</td>\n      <td>19.467600</td>\n      <td>0-query</td>\n      <td>675816-base</td>\n      <td>877519-base</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>336969</td>\n      <td>20.747215</td>\n      <td>0-query</td>\n      <td>675816-base</td>\n      <td>361564-base</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>1934845</td>\n      <td>23.199968</td>\n      <td>0-query</td>\n      <td>675816-base</td>\n      <td>2725256-base</td>\n      <td>0</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>13374</td>\n      <td>23.545162</td>\n      <td>0-query</td>\n      <td>675816-base</td>\n      <td>13406-base</td>\n      <td>0</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_distances['target'] = df_distances['target_idx'] == df_distances['base_idx']\n",
    "df_distances['target'] = df_distances['target'].astype('int')\n",
    "df_distances.head()"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "outputs": [
    {
     "data": {
      "text/plain": "       idx   distance query_idx   target_idx      base_idx  target         0  \\\n0   598613  19.299540   0-query  675816-base   675816-base       1  1.299519   \n1   755584  19.467600   0-query  675816-base   877519-base       0  1.299519   \n2   336969  20.747215   0-query  675816-base   361564-base       0  1.299519   \n3  1934845  23.199968   0-query  675816-base  2725256-base       0  1.299519   \n4    13374  23.545162   0-query  675816-base    13406-base       0  1.299519   \n\n          1         2         3  ...        62        63       64        65  \\\n0  1.996888  0.063774 -1.879671  ... -0.866975  1.274319 -0.02441 -1.173481   \n1  1.996888  0.063774 -1.879671  ... -0.866975  1.274319 -0.02441 -1.173481   \n2  1.996888  0.063774 -1.879671  ... -0.866975  1.274319 -0.02441 -1.173481   \n3  1.996888  0.063774 -1.879671  ... -0.866975  1.274319 -0.02441 -1.173481   \n4  1.996888  0.063774 -1.879671  ... -0.866975  1.274319 -0.02441 -1.173481   \n\n         66        67        68        69        70        71  \n0 -1.035388  0.197184 -0.200786  0.906575  0.995267  0.522963  \n1 -1.035388  0.197184 -0.200786  0.906575  0.995267  0.522963  \n2 -1.035388  0.197184 -0.200786  0.906575  0.995267  0.522963  \n3 -1.035388  0.197184 -0.200786  0.906575  0.995267  0.522963  \n4 -1.035388  0.197184 -0.200786  0.906575  0.995267  0.522963  \n\n[5 rows x 78 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>idx</th>\n      <th>distance</th>\n      <th>query_idx</th>\n      <th>target_idx</th>\n      <th>base_idx</th>\n      <th>target</th>\n      <th>0</th>\n      <th>1</th>\n      <th>2</th>\n      <th>3</th>\n      <th>...</th>\n      <th>62</th>\n      <th>63</th>\n      <th>64</th>\n      <th>65</th>\n      <th>66</th>\n      <th>67</th>\n      <th>68</th>\n      <th>69</th>\n      <th>70</th>\n      <th>71</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>598613</td>\n      <td>19.299540</td>\n      <td>0-query</td>\n      <td>675816-base</td>\n      <td>675816-base</td>\n      <td>1</td>\n      <td>1.299519</td>\n      <td>1.996888</td>\n      <td>0.063774</td>\n      <td>-1.879671</td>\n      <td>...</td>\n      <td>-0.866975</td>\n      <td>1.274319</td>\n      <td>-0.02441</td>\n      <td>-1.173481</td>\n      <td>-1.035388</td>\n      <td>0.197184</td>\n      <td>-0.200786</td>\n      <td>0.906575</td>\n      <td>0.995267</td>\n      <td>0.522963</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>755584</td>\n      <td>19.467600</td>\n      <td>0-query</td>\n      <td>675816-base</td>\n      <td>877519-base</td>\n      <td>0</td>\n      <td>1.299519</td>\n      <td>1.996888</td>\n      <td>0.063774</td>\n      <td>-1.879671</td>\n      <td>...</td>\n      <td>-0.866975</td>\n      <td>1.274319</td>\n      <td>-0.02441</td>\n      <td>-1.173481</td>\n      <td>-1.035388</td>\n      <td>0.197184</td>\n      <td>-0.200786</td>\n      <td>0.906575</td>\n      <td>0.995267</td>\n      <td>0.522963</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>336969</td>\n      <td>20.747215</td>\n      <td>0-query</td>\n      <td>675816-base</td>\n      <td>361564-base</td>\n      <td>0</td>\n      <td>1.299519</td>\n      <td>1.996888</td>\n      <td>0.063774</td>\n      <td>-1.879671</td>\n      <td>...</td>\n      <td>-0.866975</td>\n      <td>1.274319</td>\n      <td>-0.02441</td>\n      <td>-1.173481</td>\n      <td>-1.035388</td>\n      <td>0.197184</td>\n      <td>-0.200786</td>\n      <td>0.906575</td>\n      <td>0.995267</td>\n      <td>0.522963</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>1934845</td>\n      <td>23.199968</td>\n      <td>0-query</td>\n      <td>675816-base</td>\n      <td>2725256-base</td>\n      <td>0</td>\n      <td>1.299519</td>\n      <td>1.996888</td>\n      <td>0.063774</td>\n      <td>-1.879671</td>\n      <td>...</td>\n      <td>-0.866975</td>\n      <td>1.274319</td>\n      <td>-0.02441</td>\n      <td>-1.173481</td>\n      <td>-1.035388</td>\n      <td>0.197184</td>\n      <td>-0.200786</td>\n      <td>0.906575</td>\n      <td>0.995267</td>\n      <td>0.522963</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>13374</td>\n      <td>23.545162</td>\n      <td>0-query</td>\n      <td>675816-base</td>\n      <td>13406-base</td>\n      <td>0</td>\n      <td>1.299519</td>\n      <td>1.996888</td>\n      <td>0.063774</td>\n      <td>-1.879671</td>\n      <td>...</td>\n      <td>-0.866975</td>\n      <td>1.274319</td>\n      <td>-0.02441</td>\n      <td>-1.173481</td>\n      <td>-1.035388</td>\n      <td>0.197184</td>\n      <td>-0.200786</td>\n      <td>0.906575</td>\n      <td>0.995267</td>\n      <td>0.522963</td>\n    </tr>\n  </tbody>\n</table>\n<p>5 rows × 78 columns</p>\n</div>"
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# добавляем координаты векторов-запросов\n",
    "df_distances = df_distances.merge(df_train_scaled, how='inner', left_on='query_idx', right_index=True)\n",
    "df_distances.head()"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "outputs": [
    {
     "data": {
      "text/plain": "           idx   distance    query_idx   target_idx     base_idx  target  \\\n0       598613  19.299540      0-query  675816-base  675816-base       1   \n1       755584  19.467600      0-query  675816-base  877519-base       0   \n9011    755584  40.878693    901-query  161242-base  877519-base       0   \n167470  755584  18.697302  16747-query  279936-base  877519-base       0   \n289074  755584  30.626114  28907-query  674092-base  877519-base       0   \n\n             0_x       1_x       2_x       3_x  ...      62_y      63_y  \\\n0       1.299519  1.996888  0.063774 -1.879671  ... -0.927103  1.627806   \n1       1.299519  1.996888  0.063774 -1.879671  ... -0.676032  1.615281   \n9011    1.916994  1.153344 -0.199909 -1.319427  ... -0.676032  1.615281   \n167470  0.429194  1.314741 -0.513722 -1.220226  ... -0.676032  1.615281   \n289074  0.524340  2.303420  0.334275 -1.683806  ... -0.676032  1.615281   \n\n            64_y      65_y      66_y      67_y      68_y      69_y      70_y  \\\n0       0.072924  0.427460 -0.496641  0.394758 -0.568302  0.066837  0.995267   \n1      -0.247046 -0.404396 -1.148503  0.412697 -0.375744  0.728224  0.544932   \n9011   -0.247046 -0.404396 -1.148503  0.412697 -0.375744  0.728224  0.544932   \n167470 -0.247046 -0.404396 -1.148503  0.412697 -0.375744  0.728224  0.544932   \n289074 -0.247046 -0.404396 -1.148503  0.412697 -0.375744  0.728224  0.544932   \n\n            71_y  \n0       0.708902  \n1       0.270560  \n9011    0.270560  \n167470  0.270560  \n289074  0.270560  \n\n[5 rows x 150 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>idx</th>\n      <th>distance</th>\n      <th>query_idx</th>\n      <th>target_idx</th>\n      <th>base_idx</th>\n      <th>target</th>\n      <th>0_x</th>\n      <th>1_x</th>\n      <th>2_x</th>\n      <th>3_x</th>\n      <th>...</th>\n      <th>62_y</th>\n      <th>63_y</th>\n      <th>64_y</th>\n      <th>65_y</th>\n      <th>66_y</th>\n      <th>67_y</th>\n      <th>68_y</th>\n      <th>69_y</th>\n      <th>70_y</th>\n      <th>71_y</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>598613</td>\n      <td>19.299540</td>\n      <td>0-query</td>\n      <td>675816-base</td>\n      <td>675816-base</td>\n      <td>1</td>\n      <td>1.299519</td>\n      <td>1.996888</td>\n      <td>0.063774</td>\n      <td>-1.879671</td>\n      <td>...</td>\n      <td>-0.927103</td>\n      <td>1.627806</td>\n      <td>0.072924</td>\n      <td>0.427460</td>\n      <td>-0.496641</td>\n      <td>0.394758</td>\n      <td>-0.568302</td>\n      <td>0.066837</td>\n      <td>0.995267</td>\n      <td>0.708902</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>755584</td>\n      <td>19.467600</td>\n      <td>0-query</td>\n      <td>675816-base</td>\n      <td>877519-base</td>\n      <td>0</td>\n      <td>1.299519</td>\n      <td>1.996888</td>\n      <td>0.063774</td>\n      <td>-1.879671</td>\n      <td>...</td>\n      <td>-0.676032</td>\n      <td>1.615281</td>\n      <td>-0.247046</td>\n      <td>-0.404396</td>\n      <td>-1.148503</td>\n      <td>0.412697</td>\n      <td>-0.375744</td>\n      <td>0.728224</td>\n      <td>0.544932</td>\n      <td>0.270560</td>\n    </tr>\n    <tr>\n      <th>9011</th>\n      <td>755584</td>\n      <td>40.878693</td>\n      <td>901-query</td>\n      <td>161242-base</td>\n      <td>877519-base</td>\n      <td>0</td>\n      <td>1.916994</td>\n      <td>1.153344</td>\n      <td>-0.199909</td>\n      <td>-1.319427</td>\n      <td>...</td>\n      <td>-0.676032</td>\n      <td>1.615281</td>\n      <td>-0.247046</td>\n      <td>-0.404396</td>\n      <td>-1.148503</td>\n      <td>0.412697</td>\n      <td>-0.375744</td>\n      <td>0.728224</td>\n      <td>0.544932</td>\n      <td>0.270560</td>\n    </tr>\n    <tr>\n      <th>167470</th>\n      <td>755584</td>\n      <td>18.697302</td>\n      <td>16747-query</td>\n      <td>279936-base</td>\n      <td>877519-base</td>\n      <td>0</td>\n      <td>0.429194</td>\n      <td>1.314741</td>\n      <td>-0.513722</td>\n      <td>-1.220226</td>\n      <td>...</td>\n      <td>-0.676032</td>\n      <td>1.615281</td>\n      <td>-0.247046</td>\n      <td>-0.404396</td>\n      <td>-1.148503</td>\n      <td>0.412697</td>\n      <td>-0.375744</td>\n      <td>0.728224</td>\n      <td>0.544932</td>\n      <td>0.270560</td>\n    </tr>\n    <tr>\n      <th>289074</th>\n      <td>755584</td>\n      <td>30.626114</td>\n      <td>28907-query</td>\n      <td>674092-base</td>\n      <td>877519-base</td>\n      <td>0</td>\n      <td>0.524340</td>\n      <td>2.303420</td>\n      <td>0.334275</td>\n      <td>-1.683806</td>\n      <td>...</td>\n      <td>-0.676032</td>\n      <td>1.615281</td>\n      <td>-0.247046</td>\n      <td>-0.404396</td>\n      <td>-1.148503</td>\n      <td>0.412697</td>\n      <td>-0.375744</td>\n      <td>0.728224</td>\n      <td>0.544932</td>\n      <td>0.270560</td>\n    </tr>\n  </tbody>\n</table>\n<p>5 rows × 150 columns</p>\n</div>"
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# добавляем координаты векторов-рекомендаций FAISS\n",
    "df_distances = df_distances.merge(df_base_scaled, how='inner', left_on='base_idx', right_index=True)\n",
    "df_distances.head()"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "outputs": [
    {
     "data": {
      "text/plain": "         distance       0_x       1_x       2_x       3_x       4_x       5_x  \\\n0       19.299540  1.299519  1.996888  0.063774 -1.879671  1.644100 -0.537626   \n1       19.467600  1.299519  1.996888  0.063774 -1.879671  1.644100 -0.537626   \n9011    40.878693  1.916994  1.153344 -0.199909 -1.319427 -0.193553 -0.269549   \n167470  18.697302  0.429194  1.314741 -0.513722 -1.220226  1.779113  0.047929   \n289074  30.626114  0.524340  2.303420  0.334275 -1.683806  1.698728 -0.685473   \n...           ...       ...       ...       ...       ...       ...       ...   \n999988   5.967613  0.357864  0.411899 -0.502547  1.552935  0.622546 -0.811189   \n999990   1.949338 -0.113252  0.577083 -0.223273  0.123277  0.693205 -0.521279   \n999992  45.285156 -0.113252  0.577083 -0.223273  0.123277  0.693205 -0.521279   \n999996  50.704590 -0.113252  0.577083 -0.223273  0.123277  0.693205 -0.521279   \n999999  53.081379 -0.113252  0.577083 -0.223273  0.123277  0.693205 -0.521279   \n\n             6_x       7_x       8_x  ...      62_y      63_y      64_y  \\\n0       0.165148  0.279598 -2.296794  ... -0.927103  1.627806  0.072924   \n1       0.165148  0.279598 -2.296794  ... -0.676032  1.615281 -0.247046   \n9011   -1.287975  0.345101 -0.892164  ... -0.676032  1.615281 -0.247046   \n167470  1.152288  0.592973 -2.010040  ... -0.676032  1.615281 -0.247046   \n289074 -1.404976 -0.105586 -1.860783  ... -0.676032  1.615281 -0.247046   \n...          ...       ...       ...  ...       ...       ...       ...   \n999988 -1.047148  1.632550 -0.897114  ... -3.591553  1.338642 -0.507661   \n999990  1.149390  0.885642 -0.902251  ... -0.076096 -0.973159  0.485453   \n999992  1.149390  0.885642 -0.902251  ... -0.152696 -0.822039  0.676110   \n999996  1.149390  0.885642 -0.902251  ... -0.372890 -1.902764 -0.010069   \n999999  1.149390  0.885642 -0.902251  ... -0.417008  0.430435  0.103303   \n\n            65_y      66_y      67_y      68_y      69_y      70_y      71_y  \n0       0.427460 -0.496641  0.394758 -0.568302  0.066837  0.995267  0.708902  \n1      -0.404396 -1.148503  0.412697 -0.375744  0.728224  0.544932  0.270560  \n9011   -0.404396 -1.148503  0.412697 -0.375744  0.728224  0.544932  0.270560  \n167470 -0.404396 -1.148503  0.412697 -0.375744  0.728224  0.544932  0.270560  \n289074 -0.404396 -1.148503  0.412697 -0.375744  0.728224  0.544932  0.270560  \n...          ...       ...       ...       ...       ...       ...       ...  \n999988 -1.185903 -1.138544 -1.053340 -0.247918 -0.135820 -0.713847 -0.505661  \n999990 -0.940010 -0.497467  0.398658  0.897109 -0.704312  1.960963 -0.775120  \n999992 -0.900116  0.084016  0.936706  0.872203  1.067114  2.102800 -0.747574  \n999996  0.963151  0.875052  0.511212  0.674809 -1.445297  0.869343 -0.565825  \n999999 -0.144794  0.068380 -0.561988  0.732958  1.572562  1.604687 -0.528468  \n\n[1000000 rows x 145 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>distance</th>\n      <th>0_x</th>\n      <th>1_x</th>\n      <th>2_x</th>\n      <th>3_x</th>\n      <th>4_x</th>\n      <th>5_x</th>\n      <th>6_x</th>\n      <th>7_x</th>\n      <th>8_x</th>\n      <th>...</th>\n      <th>62_y</th>\n      <th>63_y</th>\n      <th>64_y</th>\n      <th>65_y</th>\n      <th>66_y</th>\n      <th>67_y</th>\n      <th>68_y</th>\n      <th>69_y</th>\n      <th>70_y</th>\n      <th>71_y</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>19.299540</td>\n      <td>1.299519</td>\n      <td>1.996888</td>\n      <td>0.063774</td>\n      <td>-1.879671</td>\n      <td>1.644100</td>\n      <td>-0.537626</td>\n      <td>0.165148</td>\n      <td>0.279598</td>\n      <td>-2.296794</td>\n      <td>...</td>\n      <td>-0.927103</td>\n      <td>1.627806</td>\n      <td>0.072924</td>\n      <td>0.427460</td>\n      <td>-0.496641</td>\n      <td>0.394758</td>\n      <td>-0.568302</td>\n      <td>0.066837</td>\n      <td>0.995267</td>\n      <td>0.708902</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>19.467600</td>\n      <td>1.299519</td>\n      <td>1.996888</td>\n      <td>0.063774</td>\n      <td>-1.879671</td>\n      <td>1.644100</td>\n      <td>-0.537626</td>\n      <td>0.165148</td>\n      <td>0.279598</td>\n      <td>-2.296794</td>\n      <td>...</td>\n      <td>-0.676032</td>\n      <td>1.615281</td>\n      <td>-0.247046</td>\n      <td>-0.404396</td>\n      <td>-1.148503</td>\n      <td>0.412697</td>\n      <td>-0.375744</td>\n      <td>0.728224</td>\n      <td>0.544932</td>\n      <td>0.270560</td>\n    </tr>\n    <tr>\n      <th>9011</th>\n      <td>40.878693</td>\n      <td>1.916994</td>\n      <td>1.153344</td>\n      <td>-0.199909</td>\n      <td>-1.319427</td>\n      <td>-0.193553</td>\n      <td>-0.269549</td>\n      <td>-1.287975</td>\n      <td>0.345101</td>\n      <td>-0.892164</td>\n      <td>...</td>\n      <td>-0.676032</td>\n      <td>1.615281</td>\n      <td>-0.247046</td>\n      <td>-0.404396</td>\n      <td>-1.148503</td>\n      <td>0.412697</td>\n      <td>-0.375744</td>\n      <td>0.728224</td>\n      <td>0.544932</td>\n      <td>0.270560</td>\n    </tr>\n    <tr>\n      <th>167470</th>\n      <td>18.697302</td>\n      <td>0.429194</td>\n      <td>1.314741</td>\n      <td>-0.513722</td>\n      <td>-1.220226</td>\n      <td>1.779113</td>\n      <td>0.047929</td>\n      <td>1.152288</td>\n      <td>0.592973</td>\n      <td>-2.010040</td>\n      <td>...</td>\n      <td>-0.676032</td>\n      <td>1.615281</td>\n      <td>-0.247046</td>\n      <td>-0.404396</td>\n      <td>-1.148503</td>\n      <td>0.412697</td>\n      <td>-0.375744</td>\n      <td>0.728224</td>\n      <td>0.544932</td>\n      <td>0.270560</td>\n    </tr>\n    <tr>\n      <th>289074</th>\n      <td>30.626114</td>\n      <td>0.524340</td>\n      <td>2.303420</td>\n      <td>0.334275</td>\n      <td>-1.683806</td>\n      <td>1.698728</td>\n      <td>-0.685473</td>\n      <td>-1.404976</td>\n      <td>-0.105586</td>\n      <td>-1.860783</td>\n      <td>...</td>\n      <td>-0.676032</td>\n      <td>1.615281</td>\n      <td>-0.247046</td>\n      <td>-0.404396</td>\n      <td>-1.148503</td>\n      <td>0.412697</td>\n      <td>-0.375744</td>\n      <td>0.728224</td>\n      <td>0.544932</td>\n      <td>0.270560</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>999988</th>\n      <td>5.967613</td>\n      <td>0.357864</td>\n      <td>0.411899</td>\n      <td>-0.502547</td>\n      <td>1.552935</td>\n      <td>0.622546</td>\n      <td>-0.811189</td>\n      <td>-1.047148</td>\n      <td>1.632550</td>\n      <td>-0.897114</td>\n      <td>...</td>\n      <td>-3.591553</td>\n      <td>1.338642</td>\n      <td>-0.507661</td>\n      <td>-1.185903</td>\n      <td>-1.138544</td>\n      <td>-1.053340</td>\n      <td>-0.247918</td>\n      <td>-0.135820</td>\n      <td>-0.713847</td>\n      <td>-0.505661</td>\n    </tr>\n    <tr>\n      <th>999990</th>\n      <td>1.949338</td>\n      <td>-0.113252</td>\n      <td>0.577083</td>\n      <td>-0.223273</td>\n      <td>0.123277</td>\n      <td>0.693205</td>\n      <td>-0.521279</td>\n      <td>1.149390</td>\n      <td>0.885642</td>\n      <td>-0.902251</td>\n      <td>...</td>\n      <td>-0.076096</td>\n      <td>-0.973159</td>\n      <td>0.485453</td>\n      <td>-0.940010</td>\n      <td>-0.497467</td>\n      <td>0.398658</td>\n      <td>0.897109</td>\n      <td>-0.704312</td>\n      <td>1.960963</td>\n      <td>-0.775120</td>\n    </tr>\n    <tr>\n      <th>999992</th>\n      <td>45.285156</td>\n      <td>-0.113252</td>\n      <td>0.577083</td>\n      <td>-0.223273</td>\n      <td>0.123277</td>\n      <td>0.693205</td>\n      <td>-0.521279</td>\n      <td>1.149390</td>\n      <td>0.885642</td>\n      <td>-0.902251</td>\n      <td>...</td>\n      <td>-0.152696</td>\n      <td>-0.822039</td>\n      <td>0.676110</td>\n      <td>-0.900116</td>\n      <td>0.084016</td>\n      <td>0.936706</td>\n      <td>0.872203</td>\n      <td>1.067114</td>\n      <td>2.102800</td>\n      <td>-0.747574</td>\n    </tr>\n    <tr>\n      <th>999996</th>\n      <td>50.704590</td>\n      <td>-0.113252</td>\n      <td>0.577083</td>\n      <td>-0.223273</td>\n      <td>0.123277</td>\n      <td>0.693205</td>\n      <td>-0.521279</td>\n      <td>1.149390</td>\n      <td>0.885642</td>\n      <td>-0.902251</td>\n      <td>...</td>\n      <td>-0.372890</td>\n      <td>-1.902764</td>\n      <td>-0.010069</td>\n      <td>0.963151</td>\n      <td>0.875052</td>\n      <td>0.511212</td>\n      <td>0.674809</td>\n      <td>-1.445297</td>\n      <td>0.869343</td>\n      <td>-0.565825</td>\n    </tr>\n    <tr>\n      <th>999999</th>\n      <td>53.081379</td>\n      <td>-0.113252</td>\n      <td>0.577083</td>\n      <td>-0.223273</td>\n      <td>0.123277</td>\n      <td>0.693205</td>\n      <td>-0.521279</td>\n      <td>1.149390</td>\n      <td>0.885642</td>\n      <td>-0.902251</td>\n      <td>...</td>\n      <td>-0.417008</td>\n      <td>0.430435</td>\n      <td>0.103303</td>\n      <td>-0.144794</td>\n      <td>0.068380</td>\n      <td>-0.561988</td>\n      <td>0.732958</td>\n      <td>1.572562</td>\n      <td>1.604687</td>\n      <td>-0.528468</td>\n    </tr>\n  </tbody>\n</table>\n<p>1000000 rows × 145 columns</p>\n</div>"
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Финальные характеристики для обучения\n",
    "df_features = df_distances.drop(['idx', 'query_idx', 'target_idx', 'base_idx', 'target'], axis=1)\n",
    "df_features"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Вывод: Так получились фичи для обчучения модели. Повторный скейлинг проводить не стоит, т.к. расстояние между векторами хоть и велико и сильно выбивается из остальных характеристик датафрейма, однако оно само по себе является основной характеристикой для обучения."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Дисбаланс классов и борьба с дисбалансом"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Так как мы присоединяли 10 ближайших соседей, из которых далеко не в каждой группе было искомое значение. Поэтому стоит ожидать сильнейшего дисбаланса классов, которого стоит устранить."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "outputs": [
    {
     "data": {
      "text/plain": "0    0.930211\n1    0.069789\nName: target, dtype: float64"
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Выясним имеющийся дисбаланс классов\n",
    "df_target = df_distances['target']\n",
    "df_target.value_counts() / df_target.shape[0]"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Для устранения такого дисбаланса используем уже готовый сэмплер SMOTE из библиотеки imblearn. Также упакуем весь процесс в пайплайн из той же библиотеки."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "outputs": [],
   "source": [
    "df_pipeline = make_pipeline(\n",
    "    SMOTE(random_state=32123),\n",
    "    CatBoostClassifier(auto_class_weights = 'Balanced', verbose=50)\n",
    ")"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Learning rate set to 0.256483\n",
      "0:\tlearn: 0.5704084\ttotal: 489ms\tremaining: 8m 8s\n",
      "50:\tlearn: 0.2311235\ttotal: 14.8s\tremaining: 4m 34s\n",
      "100:\tlearn: 0.1903140\ttotal: 28.2s\tremaining: 4m 11s\n",
      "150:\tlearn: 0.1667686\ttotal: 41.9s\tremaining: 3m 55s\n",
      "200:\tlearn: 0.1516256\ttotal: 55.3s\tremaining: 3m 39s\n",
      "250:\tlearn: 0.1381000\ttotal: 1m 8s\tremaining: 3m 25s\n",
      "300:\tlearn: 0.1291825\ttotal: 1m 22s\tremaining: 3m 11s\n",
      "350:\tlearn: 0.1226270\ttotal: 1m 35s\tremaining: 2m 56s\n",
      "400:\tlearn: 0.1153106\ttotal: 1m 49s\tremaining: 2m 43s\n",
      "450:\tlearn: 0.1091781\ttotal: 2m 3s\tremaining: 2m 30s\n",
      "500:\tlearn: 0.1041827\ttotal: 2m 17s\tremaining: 2m 17s\n",
      "550:\tlearn: 0.0995217\ttotal: 2m 31s\tremaining: 2m 3s\n",
      "600:\tlearn: 0.0956310\ttotal: 2m 45s\tremaining: 1m 49s\n",
      "650:\tlearn: 0.0906413\ttotal: 2m 59s\tremaining: 1m 36s\n",
      "700:\tlearn: 0.0873879\ttotal: 3m 13s\tremaining: 1m 22s\n",
      "750:\tlearn: 0.0839171\ttotal: 3m 27s\tremaining: 1m 8s\n",
      "800:\tlearn: 0.0812089\ttotal: 3m 41s\tremaining: 54.9s\n",
      "850:\tlearn: 0.0775747\ttotal: 3m 56s\tremaining: 41.3s\n",
      "900:\tlearn: 0.0745808\ttotal: 4m 13s\tremaining: 27.8s\n",
      "950:\tlearn: 0.0719780\ttotal: 4m 27s\tremaining: 13.8s\n",
      "999:\tlearn: 0.0700664\ttotal: 4m 41s\tremaining: 0us\n",
      "CPU times: total: 1h 4min 42s\n",
      "Wall time: 5min\n"
     ]
    },
    {
     "data": {
      "text/plain": "Pipeline(steps=[('smote', SMOTE(random_state=32123)),\n                ('catboostclassifier',\n                 <catboost.core.CatBoostClassifier object at 0x000001F5C3D2FFD0>)])",
      "text/html": "<style>#sk-container-id-2 {color: black;background-color: white;}#sk-container-id-2 pre{padding: 0;}#sk-container-id-2 div.sk-toggleable {background-color: white;}#sk-container-id-2 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-2 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-2 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-2 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-2 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-2 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-2 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-2 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-2 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-2 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-2 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-2 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-2 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-2 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-2 div.sk-item {position: relative;z-index: 1;}#sk-container-id-2 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-2 div.sk-item::before, #sk-container-id-2 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-2 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-2 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-2 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-2 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-2 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-2 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-2 div.sk-label-container {text-align: center;}#sk-container-id-2 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-2 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-2\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>Pipeline(steps=[(&#x27;smote&#x27;, SMOTE(random_state=32123)),\n                (&#x27;catboostclassifier&#x27;,\n                 &lt;catboost.core.CatBoostClassifier object at 0x000001F5C3D2FFD0&gt;)])</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-2\" type=\"checkbox\" ><label for=\"sk-estimator-id-2\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">Pipeline</label><div class=\"sk-toggleable__content\"><pre>Pipeline(steps=[(&#x27;smote&#x27;, SMOTE(random_state=32123)),\n                (&#x27;catboostclassifier&#x27;,\n                 &lt;catboost.core.CatBoostClassifier object at 0x000001F5C3D2FFD0&gt;)])</pre></div></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-3\" type=\"checkbox\" ><label for=\"sk-estimator-id-3\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">SMOTE</label><div class=\"sk-toggleable__content\"><pre>SMOTE(random_state=32123)</pre></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-4\" type=\"checkbox\" ><label for=\"sk-estimator-id-4\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">CatBoostClassifier</label><div class=\"sk-toggleable__content\"><pre>&lt;catboost.core.CatBoostClassifier object at 0x000001F5C3D2FFD0&gt;</pre></div></div></div></div></div></div></div>"
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%time\n",
    "df_pipeline.fit(df_features, df_target)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "outputs": [],
   "source": [
    "df_validation_scaled = pd.DataFrame(scaler.transform(df_validation), index=df_validation.index)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "outputs": [],
   "source": [
    "# проводим поиск с помощью FAISS\n",
    "val_vecs, val_idx = idx_l2.search(np.ascontiguousarray(df_validation_scaled.values).astype('float32'), 20)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "72.976\n"
     ]
    }
   ],
   "source": [
    "# accuracy@20\n",
    "acc = 0\n",
    "for target, el in zip(df_validation_answer.values.tolist(), val_idx.tolist()):\n",
    "    acc += int(target[0] in [base_index[r] for r in el])\n",
    "\n",
    "print(100 * acc / len(idx))"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "outputs": [
    {
     "data": {
      "text/plain": "            idx   distance     query_idx   target_idx\n999995  1811737  36.874481  199999-query  336472-base\n999996    58213  38.584469  199999-query  336472-base\n999997  1096879  38.665546  199999-query  336472-base\n999998    83340  39.445202  199999-query  336472-base\n999999   315821  40.044304  199999-query  336472-base",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>idx</th>\n      <th>distance</th>\n      <th>query_idx</th>\n      <th>target_idx</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>999995</th>\n      <td>1811737</td>\n      <td>36.874481</td>\n      <td>199999-query</td>\n      <td>336472-base</td>\n    </tr>\n    <tr>\n      <th>999996</th>\n      <td>58213</td>\n      <td>38.584469</td>\n      <td>199999-query</td>\n      <td>336472-base</td>\n    </tr>\n    <tr>\n      <th>999997</th>\n      <td>1096879</td>\n      <td>38.665546</td>\n      <td>199999-query</td>\n      <td>336472-base</td>\n    </tr>\n    <tr>\n      <th>999998</th>\n      <td>83340</td>\n      <td>39.445202</td>\n      <td>199999-query</td>\n      <td>336472-base</td>\n    </tr>\n    <tr>\n      <th>999999</th>\n      <td>315821</td>\n      <td>40.044304</td>\n      <td>199999-query</td>\n      <td>336472-base</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_distances_cb = pd.DataFrame()\n",
    "\n",
    "for i in range(df_validation.shape[0]):\n",
    "    df = pd.concat([pd.DataFrame(val_idx[i][:10]), pd.DataFrame(val_vecs[i][:10])], axis=1)\n",
    "\n",
    "    df.columns = ['idx', 'distance']\n",
    "    df['query_idx'] = f'{100000 + i}-query'\n",
    "    df['target_idx'] = df_validation_answer.values[i][0]\n",
    "    df_distances_cb = pd.concat([df_distances_cb, df], ignore_index=True)\n",
    "df_distances_cb.tail()"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "outputs": [
    {
     "data": {
      "text/plain": "       idx   distance     query_idx    target_idx      base_idx\n0  2192372   8.076221  100000-query  2676668-base  3209652-base\n1  2177660  10.031794  100000-query  2676668-base  3181043-base\n2   342838  13.769163  100000-query  2676668-base   368296-base\n3   574649  14.486226  100000-query  2676668-base   645855-base\n4  1954150  18.285746  100000-query  2676668-base  2760762-base",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>idx</th>\n      <th>distance</th>\n      <th>query_idx</th>\n      <th>target_idx</th>\n      <th>base_idx</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>2192372</td>\n      <td>8.076221</td>\n      <td>100000-query</td>\n      <td>2676668-base</td>\n      <td>3209652-base</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2177660</td>\n      <td>10.031794</td>\n      <td>100000-query</td>\n      <td>2676668-base</td>\n      <td>3181043-base</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>342838</td>\n      <td>13.769163</td>\n      <td>100000-query</td>\n      <td>2676668-base</td>\n      <td>368296-base</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>574649</td>\n      <td>14.486226</td>\n      <td>100000-query</td>\n      <td>2676668-base</td>\n      <td>645855-base</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>1954150</td>\n      <td>18.285746</td>\n      <td>100000-query</td>\n      <td>2676668-base</td>\n      <td>2760762-base</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# по внутреннему индексу FAISS восстанавливаем индекс рекомендованного товара в базовом наборе\n",
    "df_distances_cb['base_idx'] = df_distances_cb.apply(get_base_idx, axis=1)\n",
    "df_distances_cb.head()"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "outputs": [
    {
     "data": {
      "text/plain": "       idx   distance     query_idx    target_idx      base_idx        0  \\\n0  2192372   8.076221  100000-query  2676668-base  3209652-base  1.15931   \n1  2177660  10.031794  100000-query  2676668-base  3181043-base  1.15931   \n2   342838  13.769163  100000-query  2676668-base   368296-base  1.15931   \n3   574649  14.486226  100000-query  2676668-base   645855-base  1.15931   \n4  1954150  18.285746  100000-query  2676668-base  2760762-base  1.15931   \n\n          1         2         3         4  ...        62        63        64  \\\n0 -0.904901  0.811955  1.043508 -0.012313  ...  0.115241  0.676228  0.275453   \n1 -0.904901  0.811955  1.043508 -0.012313  ...  0.115241  0.676228  0.275453   \n2 -0.904901  0.811955  1.043508 -0.012313  ...  0.115241  0.676228  0.275453   \n3 -0.904901  0.811955  1.043508 -0.012313  ...  0.115241  0.676228  0.275453   \n4 -0.904901  0.811955  1.043508 -0.012313  ...  0.115241  0.676228  0.275453   \n\n         65        66       67        68        69        70        71  \n0  0.453766  0.817488  0.69355  0.597167 -0.020121 -0.777845 -1.659674  \n1  0.453766  0.817488  0.69355  0.597167 -0.020121 -0.777845 -1.659674  \n2  0.453766  0.817488  0.69355  0.597167 -0.020121 -0.777845 -1.659674  \n3  0.453766  0.817488  0.69355  0.597167 -0.020121 -0.777845 -1.659674  \n4  0.453766  0.817488  0.69355  0.597167 -0.020121 -0.777845 -1.659674  \n\n[5 rows x 77 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>idx</th>\n      <th>distance</th>\n      <th>query_idx</th>\n      <th>target_idx</th>\n      <th>base_idx</th>\n      <th>0</th>\n      <th>1</th>\n      <th>2</th>\n      <th>3</th>\n      <th>4</th>\n      <th>...</th>\n      <th>62</th>\n      <th>63</th>\n      <th>64</th>\n      <th>65</th>\n      <th>66</th>\n      <th>67</th>\n      <th>68</th>\n      <th>69</th>\n      <th>70</th>\n      <th>71</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>2192372</td>\n      <td>8.076221</td>\n      <td>100000-query</td>\n      <td>2676668-base</td>\n      <td>3209652-base</td>\n      <td>1.15931</td>\n      <td>-0.904901</td>\n      <td>0.811955</td>\n      <td>1.043508</td>\n      <td>-0.012313</td>\n      <td>...</td>\n      <td>0.115241</td>\n      <td>0.676228</td>\n      <td>0.275453</td>\n      <td>0.453766</td>\n      <td>0.817488</td>\n      <td>0.69355</td>\n      <td>0.597167</td>\n      <td>-0.020121</td>\n      <td>-0.777845</td>\n      <td>-1.659674</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2177660</td>\n      <td>10.031794</td>\n      <td>100000-query</td>\n      <td>2676668-base</td>\n      <td>3181043-base</td>\n      <td>1.15931</td>\n      <td>-0.904901</td>\n      <td>0.811955</td>\n      <td>1.043508</td>\n      <td>-0.012313</td>\n      <td>...</td>\n      <td>0.115241</td>\n      <td>0.676228</td>\n      <td>0.275453</td>\n      <td>0.453766</td>\n      <td>0.817488</td>\n      <td>0.69355</td>\n      <td>0.597167</td>\n      <td>-0.020121</td>\n      <td>-0.777845</td>\n      <td>-1.659674</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>342838</td>\n      <td>13.769163</td>\n      <td>100000-query</td>\n      <td>2676668-base</td>\n      <td>368296-base</td>\n      <td>1.15931</td>\n      <td>-0.904901</td>\n      <td>0.811955</td>\n      <td>1.043508</td>\n      <td>-0.012313</td>\n      <td>...</td>\n      <td>0.115241</td>\n      <td>0.676228</td>\n      <td>0.275453</td>\n      <td>0.453766</td>\n      <td>0.817488</td>\n      <td>0.69355</td>\n      <td>0.597167</td>\n      <td>-0.020121</td>\n      <td>-0.777845</td>\n      <td>-1.659674</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>574649</td>\n      <td>14.486226</td>\n      <td>100000-query</td>\n      <td>2676668-base</td>\n      <td>645855-base</td>\n      <td>1.15931</td>\n      <td>-0.904901</td>\n      <td>0.811955</td>\n      <td>1.043508</td>\n      <td>-0.012313</td>\n      <td>...</td>\n      <td>0.115241</td>\n      <td>0.676228</td>\n      <td>0.275453</td>\n      <td>0.453766</td>\n      <td>0.817488</td>\n      <td>0.69355</td>\n      <td>0.597167</td>\n      <td>-0.020121</td>\n      <td>-0.777845</td>\n      <td>-1.659674</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>1954150</td>\n      <td>18.285746</td>\n      <td>100000-query</td>\n      <td>2676668-base</td>\n      <td>2760762-base</td>\n      <td>1.15931</td>\n      <td>-0.904901</td>\n      <td>0.811955</td>\n      <td>1.043508</td>\n      <td>-0.012313</td>\n      <td>...</td>\n      <td>0.115241</td>\n      <td>0.676228</td>\n      <td>0.275453</td>\n      <td>0.453766</td>\n      <td>0.817488</td>\n      <td>0.69355</td>\n      <td>0.597167</td>\n      <td>-0.020121</td>\n      <td>-0.777845</td>\n      <td>-1.659674</td>\n    </tr>\n  </tbody>\n</table>\n<p>5 rows × 77 columns</p>\n</div>"
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# добавляем координаты векторов-запросов\n",
    "df_distances_cb = df_distances_cb.merge(df_validation_scaled, how='inner', left_on='query_idx', right_index=True)\n",
    "df_distances_cb.head()"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "outputs": [
    {
     "data": {
      "text/plain": "            idx   distance     query_idx    target_idx      base_idx  \\\n0       2192372   8.076221  100000-query  2676668-base  3209652-base   \n967706  2192372  19.433861  196770-query   976413-base  3209652-base   \n1       2177660  10.031794  100000-query  2676668-base  3181043-base   \n967703  2177660  17.835083  196770-query   976413-base  3181043-base   \n2        342838  13.769163  100000-query  2676668-base   368296-base   \n\n             0_x       1_x       2_x       3_x       4_x  ...      62_y  \\\n0       1.159310 -0.904901  0.811955  1.043508 -0.012313  ...  0.457506   \n967706  0.627229 -0.452532  1.104286  0.968336  0.144250  ...  0.457506   \n1       1.159310 -0.904901  0.811955  1.043508 -0.012313  ...  0.190811   \n967703  0.627229 -0.452532  1.104286  0.968336  0.144250  ...  0.190811   \n2       1.159310 -0.904901  0.811955  1.043508 -0.012313  ... -0.228085   \n\n            63_y      64_y      65_y      66_y      67_y      68_y      69_y  \\\n0       0.948873  0.570802  1.213205  0.909867  0.897679  0.018913  0.430446   \n967706  0.948873  0.570802  1.213205  0.909867  0.897679  0.018913  0.430446   \n1       0.477240 -0.017377  0.170011  0.557127  0.766064  0.048265  0.406469   \n967703  0.477240 -0.017377  0.170011  0.557127  0.766064  0.048265  0.406469   \n2       0.011799  0.883902  1.235784  0.495694  0.563696  0.170485 -0.324675   \n\n            70_y      71_y  \n0      -0.713847 -1.182590  \n967706 -0.713847 -1.182590  \n1      -0.713847 -1.284064  \n967703 -0.713847 -1.284064  \n2      -1.051440 -1.097352  \n\n[5 rows x 149 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>idx</th>\n      <th>distance</th>\n      <th>query_idx</th>\n      <th>target_idx</th>\n      <th>base_idx</th>\n      <th>0_x</th>\n      <th>1_x</th>\n      <th>2_x</th>\n      <th>3_x</th>\n      <th>4_x</th>\n      <th>...</th>\n      <th>62_y</th>\n      <th>63_y</th>\n      <th>64_y</th>\n      <th>65_y</th>\n      <th>66_y</th>\n      <th>67_y</th>\n      <th>68_y</th>\n      <th>69_y</th>\n      <th>70_y</th>\n      <th>71_y</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>2192372</td>\n      <td>8.076221</td>\n      <td>100000-query</td>\n      <td>2676668-base</td>\n      <td>3209652-base</td>\n      <td>1.159310</td>\n      <td>-0.904901</td>\n      <td>0.811955</td>\n      <td>1.043508</td>\n      <td>-0.012313</td>\n      <td>...</td>\n      <td>0.457506</td>\n      <td>0.948873</td>\n      <td>0.570802</td>\n      <td>1.213205</td>\n      <td>0.909867</td>\n      <td>0.897679</td>\n      <td>0.018913</td>\n      <td>0.430446</td>\n      <td>-0.713847</td>\n      <td>-1.182590</td>\n    </tr>\n    <tr>\n      <th>967706</th>\n      <td>2192372</td>\n      <td>19.433861</td>\n      <td>196770-query</td>\n      <td>976413-base</td>\n      <td>3209652-base</td>\n      <td>0.627229</td>\n      <td>-0.452532</td>\n      <td>1.104286</td>\n      <td>0.968336</td>\n      <td>0.144250</td>\n      <td>...</td>\n      <td>0.457506</td>\n      <td>0.948873</td>\n      <td>0.570802</td>\n      <td>1.213205</td>\n      <td>0.909867</td>\n      <td>0.897679</td>\n      <td>0.018913</td>\n      <td>0.430446</td>\n      <td>-0.713847</td>\n      <td>-1.182590</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2177660</td>\n      <td>10.031794</td>\n      <td>100000-query</td>\n      <td>2676668-base</td>\n      <td>3181043-base</td>\n      <td>1.159310</td>\n      <td>-0.904901</td>\n      <td>0.811955</td>\n      <td>1.043508</td>\n      <td>-0.012313</td>\n      <td>...</td>\n      <td>0.190811</td>\n      <td>0.477240</td>\n      <td>-0.017377</td>\n      <td>0.170011</td>\n      <td>0.557127</td>\n      <td>0.766064</td>\n      <td>0.048265</td>\n      <td>0.406469</td>\n      <td>-0.713847</td>\n      <td>-1.284064</td>\n    </tr>\n    <tr>\n      <th>967703</th>\n      <td>2177660</td>\n      <td>17.835083</td>\n      <td>196770-query</td>\n      <td>976413-base</td>\n      <td>3181043-base</td>\n      <td>0.627229</td>\n      <td>-0.452532</td>\n      <td>1.104286</td>\n      <td>0.968336</td>\n      <td>0.144250</td>\n      <td>...</td>\n      <td>0.190811</td>\n      <td>0.477240</td>\n      <td>-0.017377</td>\n      <td>0.170011</td>\n      <td>0.557127</td>\n      <td>0.766064</td>\n      <td>0.048265</td>\n      <td>0.406469</td>\n      <td>-0.713847</td>\n      <td>-1.284064</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>342838</td>\n      <td>13.769163</td>\n      <td>100000-query</td>\n      <td>2676668-base</td>\n      <td>368296-base</td>\n      <td>1.159310</td>\n      <td>-0.904901</td>\n      <td>0.811955</td>\n      <td>1.043508</td>\n      <td>-0.012313</td>\n      <td>...</td>\n      <td>-0.228085</td>\n      <td>0.011799</td>\n      <td>0.883902</td>\n      <td>1.235784</td>\n      <td>0.495694</td>\n      <td>0.563696</td>\n      <td>0.170485</td>\n      <td>-0.324675</td>\n      <td>-1.051440</td>\n      <td>-1.097352</td>\n    </tr>\n  </tbody>\n</table>\n<p>5 rows × 149 columns</p>\n</div>"
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# добавляем координаты векторов-рекомендаций FAISS\n",
    "df_distances_cb = df_distances_cb.merge(df_base_scaled, how='inner', left_on='base_idx', right_index=True)\n",
    "df_distances_cb.head()"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "outputs": [
    {
     "data": {
      "text/plain": "         distance       0_x       1_x       2_x       3_x       4_x       5_x  \\\n0        8.076221  1.159310 -0.904901  0.811955  1.043508 -0.012313 -0.329532   \n967706  19.433861  0.627229 -0.452532  1.104286  0.968336  0.144250 -0.268212   \n1       10.031794  1.159310 -0.904901  0.811955  1.043508 -0.012313 -0.329532   \n967703  17.835083  0.627229 -0.452532  1.104286  0.968336  0.144250 -0.268212   \n2       13.769163  1.159310 -0.904901  0.811955  1.043508 -0.012313 -0.329532   \n\n             6_x       7_x       8_x  ...      62_y      63_y      64_y  \\\n0      -0.253186  1.860318 -1.699665  ...  0.457506  0.948873  0.570802   \n967706  1.743702  1.355926 -1.102258  ...  0.457506  0.948873  0.570802   \n1      -0.253186  1.860318 -1.699665  ...  0.190811  0.477240 -0.017377   \n967703  1.743702  1.355926 -1.102258  ...  0.190811  0.477240 -0.017377   \n2      -0.253186  1.860318 -1.699665  ... -0.228085  0.011799  0.883902   \n\n            65_y      66_y      67_y      68_y      69_y      70_y      71_y  \n0       1.213205  0.909867  0.897679  0.018913  0.430446 -0.713847 -1.182590  \n967706  1.213205  0.909867  0.897679  0.018913  0.430446 -0.713847 -1.182590  \n1       0.170011  0.557127  0.766064  0.048265  0.406469 -0.713847 -1.284064  \n967703  0.170011  0.557127  0.766064  0.048265  0.406469 -0.713847 -1.284064  \n2       1.235784  0.495694  0.563696  0.170485 -0.324675 -1.051440 -1.097352  \n\n[5 rows x 145 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>distance</th>\n      <th>0_x</th>\n      <th>1_x</th>\n      <th>2_x</th>\n      <th>3_x</th>\n      <th>4_x</th>\n      <th>5_x</th>\n      <th>6_x</th>\n      <th>7_x</th>\n      <th>8_x</th>\n      <th>...</th>\n      <th>62_y</th>\n      <th>63_y</th>\n      <th>64_y</th>\n      <th>65_y</th>\n      <th>66_y</th>\n      <th>67_y</th>\n      <th>68_y</th>\n      <th>69_y</th>\n      <th>70_y</th>\n      <th>71_y</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>8.076221</td>\n      <td>1.159310</td>\n      <td>-0.904901</td>\n      <td>0.811955</td>\n      <td>1.043508</td>\n      <td>-0.012313</td>\n      <td>-0.329532</td>\n      <td>-0.253186</td>\n      <td>1.860318</td>\n      <td>-1.699665</td>\n      <td>...</td>\n      <td>0.457506</td>\n      <td>0.948873</td>\n      <td>0.570802</td>\n      <td>1.213205</td>\n      <td>0.909867</td>\n      <td>0.897679</td>\n      <td>0.018913</td>\n      <td>0.430446</td>\n      <td>-0.713847</td>\n      <td>-1.182590</td>\n    </tr>\n    <tr>\n      <th>967706</th>\n      <td>19.433861</td>\n      <td>0.627229</td>\n      <td>-0.452532</td>\n      <td>1.104286</td>\n      <td>0.968336</td>\n      <td>0.144250</td>\n      <td>-0.268212</td>\n      <td>1.743702</td>\n      <td>1.355926</td>\n      <td>-1.102258</td>\n      <td>...</td>\n      <td>0.457506</td>\n      <td>0.948873</td>\n      <td>0.570802</td>\n      <td>1.213205</td>\n      <td>0.909867</td>\n      <td>0.897679</td>\n      <td>0.018913</td>\n      <td>0.430446</td>\n      <td>-0.713847</td>\n      <td>-1.182590</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>10.031794</td>\n      <td>1.159310</td>\n      <td>-0.904901</td>\n      <td>0.811955</td>\n      <td>1.043508</td>\n      <td>-0.012313</td>\n      <td>-0.329532</td>\n      <td>-0.253186</td>\n      <td>1.860318</td>\n      <td>-1.699665</td>\n      <td>...</td>\n      <td>0.190811</td>\n      <td>0.477240</td>\n      <td>-0.017377</td>\n      <td>0.170011</td>\n      <td>0.557127</td>\n      <td>0.766064</td>\n      <td>0.048265</td>\n      <td>0.406469</td>\n      <td>-0.713847</td>\n      <td>-1.284064</td>\n    </tr>\n    <tr>\n      <th>967703</th>\n      <td>17.835083</td>\n      <td>0.627229</td>\n      <td>-0.452532</td>\n      <td>1.104286</td>\n      <td>0.968336</td>\n      <td>0.144250</td>\n      <td>-0.268212</td>\n      <td>1.743702</td>\n      <td>1.355926</td>\n      <td>-1.102258</td>\n      <td>...</td>\n      <td>0.190811</td>\n      <td>0.477240</td>\n      <td>-0.017377</td>\n      <td>0.170011</td>\n      <td>0.557127</td>\n      <td>0.766064</td>\n      <td>0.048265</td>\n      <td>0.406469</td>\n      <td>-0.713847</td>\n      <td>-1.284064</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>13.769163</td>\n      <td>1.159310</td>\n      <td>-0.904901</td>\n      <td>0.811955</td>\n      <td>1.043508</td>\n      <td>-0.012313</td>\n      <td>-0.329532</td>\n      <td>-0.253186</td>\n      <td>1.860318</td>\n      <td>-1.699665</td>\n      <td>...</td>\n      <td>-0.228085</td>\n      <td>0.011799</td>\n      <td>0.883902</td>\n      <td>1.235784</td>\n      <td>0.495694</td>\n      <td>0.563696</td>\n      <td>0.170485</td>\n      <td>-0.324675</td>\n      <td>-1.051440</td>\n      <td>-1.097352</td>\n    </tr>\n  </tbody>\n</table>\n<p>5 rows × 145 columns</p>\n</div>"
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# характеристики для CatBoost\n",
    "features_cb = df_distances_cb.drop(['idx', 'query_idx', 'target_idx', 'base_idx'], axis=1)\n",
    "features_cb.head()"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "outputs": [
    {
     "data": {
      "text/plain": "        predict_proba\n0            0.017558\n1            0.002171\n2            0.019958\n3            0.003487\n4            0.497867\n...               ...\n999995       0.000110\n999996       0.000227\n999997       0.000044\n999998       0.007537\n999999       0.020359\n\n[1000000 rows x 1 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>predict_proba</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0.017558</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>0.002171</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>0.019958</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>0.003487</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>0.497867</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>999995</th>\n      <td>0.000110</td>\n    </tr>\n    <tr>\n      <th>999996</th>\n      <td>0.000227</td>\n    </tr>\n    <tr>\n      <th>999997</th>\n      <td>0.000044</td>\n    </tr>\n    <tr>\n      <th>999998</th>\n      <td>0.007537</td>\n    </tr>\n    <tr>\n      <th>999999</th>\n      <td>0.020359</td>\n    </tr>\n  </tbody>\n</table>\n<p>1000000 rows × 1 columns</p>\n</div>"
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Вероятности предсказания понадобятся для ранжирования ответов модели поиска ближайших соседей\n",
    "predictions = df_pipeline.predict_proba(features_cb)\n",
    "\n",
    "predictions = pd.DataFrame(predictions)\n",
    "predictions.drop([0], axis=1, inplace=True)\n",
    "predictions.columns=['predict_proba']\n",
    "predictions"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "<div class=\"alert alert-info\", padding: 15px>\n",
    "<strong>Комментарий студента:</strong>\n",
    "Вот здесь я и остановился. У меня есть понимание как действовать дальше - имея по 20 векторов (ближайших соседей) и вероятности их предсказания - стоит выбрать из этих 20-ти 5 наиболее вероятных векторов. Но дело в том, что я не понимаю как именно это реализовать в коде. Поэтому далее по совету товарищей решил просто выбрать 5 самых наиболее вероятных векторов из всего миллионного списка (что в корне не верное решение).\n",
    "\n",
    "Могу ли я расчитывать на подсказку \\ пример реализации данного хода?\n",
    "</div>"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "outputs": [
    {
     "data": {
      "text/plain": "            idx   distance     query_idx    target_idx      base_idx  \\\n0       2192372   8.076221  100000-query  2676668-base  3209652-base   \n967706  2192372  19.433861  196770-query   976413-base  3209652-base   \n1       2177660  10.031794  100000-query  2676668-base  3181043-base   \n967703  2177660  17.835083  196770-query   976413-base  3181043-base   \n2        342838  13.769163  100000-query  2676668-base   368296-base   \n\n             0_x       1_x       2_x       3_x       4_x  ...      63_y  \\\n0       1.159310 -0.904901  0.811955  1.043508 -0.012313  ...  0.948873   \n967706  0.627229 -0.452532  1.104286  0.968336  0.144250  ...  0.948873   \n1       1.159310 -0.904901  0.811955  1.043508 -0.012313  ...  0.477240   \n967703  0.627229 -0.452532  1.104286  0.968336  0.144250  ...  0.477240   \n2       1.159310 -0.904901  0.811955  1.043508 -0.012313  ...  0.011799   \n\n            64_y      65_y      66_y      67_y      68_y      69_y      70_y  \\\n0       0.570802  1.213205  0.909867  0.897679  0.018913  0.430446 -0.713847   \n967706  0.570802  1.213205  0.909867  0.897679  0.018913  0.430446 -0.713847   \n1      -0.017377  0.170011  0.557127  0.766064  0.048265  0.406469 -0.713847   \n967703 -0.017377  0.170011  0.557127  0.766064  0.048265  0.406469 -0.713847   \n2       0.883902  1.235784  0.495694  0.563696  0.170485 -0.324675 -1.051440   \n\n            71_y  predict_proba  \n0      -1.182590       0.017558  \n967706 -1.182590       0.057883  \n1      -1.284064       0.002171  \n967703 -1.284064       0.037669  \n2      -1.097352       0.019958  \n\n[5 rows x 150 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>idx</th>\n      <th>distance</th>\n      <th>query_idx</th>\n      <th>target_idx</th>\n      <th>base_idx</th>\n      <th>0_x</th>\n      <th>1_x</th>\n      <th>2_x</th>\n      <th>3_x</th>\n      <th>4_x</th>\n      <th>...</th>\n      <th>63_y</th>\n      <th>64_y</th>\n      <th>65_y</th>\n      <th>66_y</th>\n      <th>67_y</th>\n      <th>68_y</th>\n      <th>69_y</th>\n      <th>70_y</th>\n      <th>71_y</th>\n      <th>predict_proba</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>2192372</td>\n      <td>8.076221</td>\n      <td>100000-query</td>\n      <td>2676668-base</td>\n      <td>3209652-base</td>\n      <td>1.159310</td>\n      <td>-0.904901</td>\n      <td>0.811955</td>\n      <td>1.043508</td>\n      <td>-0.012313</td>\n      <td>...</td>\n      <td>0.948873</td>\n      <td>0.570802</td>\n      <td>1.213205</td>\n      <td>0.909867</td>\n      <td>0.897679</td>\n      <td>0.018913</td>\n      <td>0.430446</td>\n      <td>-0.713847</td>\n      <td>-1.182590</td>\n      <td>0.017558</td>\n    </tr>\n    <tr>\n      <th>967706</th>\n      <td>2192372</td>\n      <td>19.433861</td>\n      <td>196770-query</td>\n      <td>976413-base</td>\n      <td>3209652-base</td>\n      <td>0.627229</td>\n      <td>-0.452532</td>\n      <td>1.104286</td>\n      <td>0.968336</td>\n      <td>0.144250</td>\n      <td>...</td>\n      <td>0.948873</td>\n      <td>0.570802</td>\n      <td>1.213205</td>\n      <td>0.909867</td>\n      <td>0.897679</td>\n      <td>0.018913</td>\n      <td>0.430446</td>\n      <td>-0.713847</td>\n      <td>-1.182590</td>\n      <td>0.057883</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2177660</td>\n      <td>10.031794</td>\n      <td>100000-query</td>\n      <td>2676668-base</td>\n      <td>3181043-base</td>\n      <td>1.159310</td>\n      <td>-0.904901</td>\n      <td>0.811955</td>\n      <td>1.043508</td>\n      <td>-0.012313</td>\n      <td>...</td>\n      <td>0.477240</td>\n      <td>-0.017377</td>\n      <td>0.170011</td>\n      <td>0.557127</td>\n      <td>0.766064</td>\n      <td>0.048265</td>\n      <td>0.406469</td>\n      <td>-0.713847</td>\n      <td>-1.284064</td>\n      <td>0.002171</td>\n    </tr>\n    <tr>\n      <th>967703</th>\n      <td>2177660</td>\n      <td>17.835083</td>\n      <td>196770-query</td>\n      <td>976413-base</td>\n      <td>3181043-base</td>\n      <td>0.627229</td>\n      <td>-0.452532</td>\n      <td>1.104286</td>\n      <td>0.968336</td>\n      <td>0.144250</td>\n      <td>...</td>\n      <td>0.477240</td>\n      <td>-0.017377</td>\n      <td>0.170011</td>\n      <td>0.557127</td>\n      <td>0.766064</td>\n      <td>0.048265</td>\n      <td>0.406469</td>\n      <td>-0.713847</td>\n      <td>-1.284064</td>\n      <td>0.037669</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>342838</td>\n      <td>13.769163</td>\n      <td>100000-query</td>\n      <td>2676668-base</td>\n      <td>368296-base</td>\n      <td>1.159310</td>\n      <td>-0.904901</td>\n      <td>0.811955</td>\n      <td>1.043508</td>\n      <td>-0.012313</td>\n      <td>...</td>\n      <td>0.011799</td>\n      <td>0.883902</td>\n      <td>1.235784</td>\n      <td>0.495694</td>\n      <td>0.563696</td>\n      <td>0.170485</td>\n      <td>-0.324675</td>\n      <td>-1.051440</td>\n      <td>-1.097352</td>\n      <td>0.019958</td>\n    </tr>\n  </tbody>\n</table>\n<p>5 rows × 150 columns</p>\n</div>"
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# добавляем вероятности для предсказаний\n",
    "df_distances_cb = df_distances_cb.merge(\n",
    "    predictions.loc[:, 'predict_proba'],\n",
    "    how='inner',\n",
    "    left_index=True,\n",
    "    right_index=True\n",
    ")\n",
    "df_distances_cb.head()"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "outputs": [
    {
     "data": {
      "text/plain": "           query_idx      base_idx  predict_proba\n0       100000-query  3209652-base       0.017558\n967706  196770-query  3209652-base       0.057883\n1       100000-query  3181043-base       0.002171\n967703  196770-query  3181043-base       0.037669\n2       100000-query   368296-base       0.019958\n...              ...           ...            ...\n999986  199998-query  1938096-base       0.000212\n999987  199998-query  3296935-base       0.000538\n999988  199998-query  2736444-base       0.003771\n999993  199999-query  2385617-base       0.000129\n999995  199999-query  2503531-base       0.000110\n\n[1000000 rows x 3 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>query_idx</th>\n      <th>base_idx</th>\n      <th>predict_proba</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>100000-query</td>\n      <td>3209652-base</td>\n      <td>0.017558</td>\n    </tr>\n    <tr>\n      <th>967706</th>\n      <td>196770-query</td>\n      <td>3209652-base</td>\n      <td>0.057883</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>100000-query</td>\n      <td>3181043-base</td>\n      <td>0.002171</td>\n    </tr>\n    <tr>\n      <th>967703</th>\n      <td>196770-query</td>\n      <td>3181043-base</td>\n      <td>0.037669</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>100000-query</td>\n      <td>368296-base</td>\n      <td>0.019958</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>999986</th>\n      <td>199998-query</td>\n      <td>1938096-base</td>\n      <td>0.000212</td>\n    </tr>\n    <tr>\n      <th>999987</th>\n      <td>199998-query</td>\n      <td>3296935-base</td>\n      <td>0.000538</td>\n    </tr>\n    <tr>\n      <th>999988</th>\n      <td>199998-query</td>\n      <td>2736444-base</td>\n      <td>0.003771</td>\n    </tr>\n    <tr>\n      <th>999993</th>\n      <td>199999-query</td>\n      <td>2385617-base</td>\n      <td>0.000129</td>\n    </tr>\n    <tr>\n      <th>999995</th>\n      <td>199999-query</td>\n      <td>2503531-base</td>\n      <td>0.000110</td>\n    </tr>\n  </tbody>\n</table>\n<p>1000000 rows × 3 columns</p>\n</div>"
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# уберем координаты векторов\n",
    "recommendations = df_distances_cb.loc[:, ['query_idx', 'base_idx', 'predict_proba']]\n",
    "recommendations"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "outputs": [
    {
     "ename": "AttributeError",
     "evalue": "'PandasExprVisitor' object has no attribute 'visit_ListComp'",
     "output_type": "error",
     "traceback": [
      "\u001B[1;31m---------------------------------------------------------------------------\u001B[0m",
      "\u001B[1;31mAttributeError\u001B[0m                            Traceback (most recent call last)",
      "Cell \u001B[1;32mIn[56], line 2\u001B[0m\n\u001B[0;32m      1\u001B[0m el \u001B[38;5;241m=\u001B[39m val_idx\u001B[38;5;241m.\u001B[39mtolist()[\u001B[38;5;241m2\u001B[39m]\n\u001B[1;32m----> 2\u001B[0m \u001B[43mrecommendations\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mquery\u001B[49m\u001B[43m(\u001B[49m\u001B[38;5;124;43m'\u001B[39;49m\u001B[38;5;124;43mbase_idx in [base_index[r] for r in el]\u001B[39;49m\u001B[38;5;124;43m'\u001B[39;49m\u001B[43m)\u001B[49m\n",
      "File \u001B[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\pandas\\util\\_decorators.py:331\u001B[0m, in \u001B[0;36mdeprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper\u001B[1;34m(*args, **kwargs)\u001B[0m\n\u001B[0;32m    325\u001B[0m \u001B[38;5;28;01mif\u001B[39;00m \u001B[38;5;28mlen\u001B[39m(args) \u001B[38;5;241m>\u001B[39m num_allow_args:\n\u001B[0;32m    326\u001B[0m     warnings\u001B[38;5;241m.\u001B[39mwarn(\n\u001B[0;32m    327\u001B[0m         msg\u001B[38;5;241m.\u001B[39mformat(arguments\u001B[38;5;241m=\u001B[39m_format_argument_list(allow_args)),\n\u001B[0;32m    328\u001B[0m         \u001B[38;5;167;01mFutureWarning\u001B[39;00m,\n\u001B[0;32m    329\u001B[0m         stacklevel\u001B[38;5;241m=\u001B[39mfind_stack_level(),\n\u001B[0;32m    330\u001B[0m     )\n\u001B[1;32m--> 331\u001B[0m \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[43mfunc\u001B[49m\u001B[43m(\u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43margs\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43mkwargs\u001B[49m\u001B[43m)\u001B[49m\n",
      "File \u001B[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\pandas\\core\\frame.py:4463\u001B[0m, in \u001B[0;36mDataFrame.query\u001B[1;34m(self, expr, inplace, **kwargs)\u001B[0m\n\u001B[0;32m   4461\u001B[0m kwargs[\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mlevel\u001B[39m\u001B[38;5;124m\"\u001B[39m] \u001B[38;5;241m=\u001B[39m kwargs\u001B[38;5;241m.\u001B[39mpop(\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mlevel\u001B[39m\u001B[38;5;124m\"\u001B[39m, \u001B[38;5;241m0\u001B[39m) \u001B[38;5;241m+\u001B[39m \u001B[38;5;241m2\u001B[39m\n\u001B[0;32m   4462\u001B[0m kwargs[\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mtarget\u001B[39m\u001B[38;5;124m\"\u001B[39m] \u001B[38;5;241m=\u001B[39m \u001B[38;5;28;01mNone\u001B[39;00m\n\u001B[1;32m-> 4463\u001B[0m res \u001B[38;5;241m=\u001B[39m \u001B[38;5;28;43mself\u001B[39;49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43meval\u001B[49m\u001B[43m(\u001B[49m\u001B[43mexpr\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43mkwargs\u001B[49m\u001B[43m)\u001B[49m\n\u001B[0;32m   4465\u001B[0m \u001B[38;5;28;01mtry\u001B[39;00m:\n\u001B[0;32m   4466\u001B[0m     result \u001B[38;5;241m=\u001B[39m \u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39mloc[res]\n",
      "File \u001B[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\pandas\\util\\_decorators.py:331\u001B[0m, in \u001B[0;36mdeprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper\u001B[1;34m(*args, **kwargs)\u001B[0m\n\u001B[0;32m    325\u001B[0m \u001B[38;5;28;01mif\u001B[39;00m \u001B[38;5;28mlen\u001B[39m(args) \u001B[38;5;241m>\u001B[39m num_allow_args:\n\u001B[0;32m    326\u001B[0m     warnings\u001B[38;5;241m.\u001B[39mwarn(\n\u001B[0;32m    327\u001B[0m         msg\u001B[38;5;241m.\u001B[39mformat(arguments\u001B[38;5;241m=\u001B[39m_format_argument_list(allow_args)),\n\u001B[0;32m    328\u001B[0m         \u001B[38;5;167;01mFutureWarning\u001B[39;00m,\n\u001B[0;32m    329\u001B[0m         stacklevel\u001B[38;5;241m=\u001B[39mfind_stack_level(),\n\u001B[0;32m    330\u001B[0m     )\n\u001B[1;32m--> 331\u001B[0m \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[43mfunc\u001B[49m\u001B[43m(\u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43margs\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43mkwargs\u001B[49m\u001B[43m)\u001B[49m\n",
      "File \u001B[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\pandas\\core\\frame.py:4601\u001B[0m, in \u001B[0;36mDataFrame.eval\u001B[1;34m(self, expr, inplace, **kwargs)\u001B[0m\n\u001B[0;32m   4598\u001B[0m     kwargs[\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mtarget\u001B[39m\u001B[38;5;124m\"\u001B[39m] \u001B[38;5;241m=\u001B[39m \u001B[38;5;28mself\u001B[39m\n\u001B[0;32m   4599\u001B[0m kwargs[\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mresolvers\u001B[39m\u001B[38;5;124m\"\u001B[39m] \u001B[38;5;241m=\u001B[39m \u001B[38;5;28mtuple\u001B[39m(kwargs\u001B[38;5;241m.\u001B[39mget(\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mresolvers\u001B[39m\u001B[38;5;124m\"\u001B[39m, ())) \u001B[38;5;241m+\u001B[39m resolvers\n\u001B[1;32m-> 4601\u001B[0m \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[43m_eval\u001B[49m\u001B[43m(\u001B[49m\u001B[43mexpr\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43minplace\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43minplace\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43mkwargs\u001B[49m\u001B[43m)\u001B[49m\n",
      "File \u001B[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\pandas\\core\\computation\\eval.py:353\u001B[0m, in \u001B[0;36meval\u001B[1;34m(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target, inplace)\u001B[0m\n\u001B[0;32m    344\u001B[0m \u001B[38;5;66;03m# get our (possibly passed-in) scope\u001B[39;00m\n\u001B[0;32m    345\u001B[0m env \u001B[38;5;241m=\u001B[39m ensure_scope(\n\u001B[0;32m    346\u001B[0m     level \u001B[38;5;241m+\u001B[39m \u001B[38;5;241m1\u001B[39m,\n\u001B[0;32m    347\u001B[0m     global_dict\u001B[38;5;241m=\u001B[39mglobal_dict,\n\u001B[1;32m   (...)\u001B[0m\n\u001B[0;32m    350\u001B[0m     target\u001B[38;5;241m=\u001B[39mtarget,\n\u001B[0;32m    351\u001B[0m )\n\u001B[1;32m--> 353\u001B[0m parsed_expr \u001B[38;5;241m=\u001B[39m \u001B[43mExpr\u001B[49m\u001B[43m(\u001B[49m\u001B[43mexpr\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mengine\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mengine\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mparser\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mparser\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43menv\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43menv\u001B[49m\u001B[43m)\u001B[49m\n\u001B[0;32m    355\u001B[0m \u001B[38;5;66;03m# construct the engine and evaluate the parsed expression\u001B[39;00m\n\u001B[0;32m    356\u001B[0m eng \u001B[38;5;241m=\u001B[39m ENGINES[engine]\n",
      "File \u001B[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\pandas\\core\\computation\\expr.py:813\u001B[0m, in \u001B[0;36mExpr.__init__\u001B[1;34m(self, expr, engine, parser, env, level)\u001B[0m\n\u001B[0;32m    811\u001B[0m \u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39mparser \u001B[38;5;241m=\u001B[39m parser\n\u001B[0;32m    812\u001B[0m \u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39m_visitor \u001B[38;5;241m=\u001B[39m PARSERS[parser](\u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39menv, \u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39mengine, \u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39mparser)\n\u001B[1;32m--> 813\u001B[0m \u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39mterms \u001B[38;5;241m=\u001B[39m \u001B[38;5;28;43mself\u001B[39;49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mparse\u001B[49m\u001B[43m(\u001B[49m\u001B[43m)\u001B[49m\n",
      "File \u001B[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\pandas\\core\\computation\\expr.py:832\u001B[0m, in \u001B[0;36mExpr.parse\u001B[1;34m(self)\u001B[0m\n\u001B[0;32m    828\u001B[0m \u001B[38;5;28;01mdef\u001B[39;00m \u001B[38;5;21mparse\u001B[39m(\u001B[38;5;28mself\u001B[39m):\n\u001B[0;32m    829\u001B[0m \u001B[38;5;250m    \u001B[39m\u001B[38;5;124;03m\"\"\"\u001B[39;00m\n\u001B[0;32m    830\u001B[0m \u001B[38;5;124;03m    Parse an expression.\u001B[39;00m\n\u001B[0;32m    831\u001B[0m \u001B[38;5;124;03m    \"\"\"\u001B[39;00m\n\u001B[1;32m--> 832\u001B[0m     \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[38;5;28;43mself\u001B[39;49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43m_visitor\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mvisit\u001B[49m\u001B[43m(\u001B[49m\u001B[38;5;28;43mself\u001B[39;49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mexpr\u001B[49m\u001B[43m)\u001B[49m\n",
      "File \u001B[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\pandas\\core\\computation\\expr.py:415\u001B[0m, in \u001B[0;36mBaseExprVisitor.visit\u001B[1;34m(self, node, **kwargs)\u001B[0m\n\u001B[0;32m    413\u001B[0m method \u001B[38;5;241m=\u001B[39m \u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mvisit_\u001B[39m\u001B[38;5;124m\"\u001B[39m \u001B[38;5;241m+\u001B[39m \u001B[38;5;28mtype\u001B[39m(node)\u001B[38;5;241m.\u001B[39m\u001B[38;5;18m__name__\u001B[39m\n\u001B[0;32m    414\u001B[0m visitor \u001B[38;5;241m=\u001B[39m \u001B[38;5;28mgetattr\u001B[39m(\u001B[38;5;28mself\u001B[39m, method)\n\u001B[1;32m--> 415\u001B[0m \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[43mvisitor\u001B[49m\u001B[43m(\u001B[49m\u001B[43mnode\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43mkwargs\u001B[49m\u001B[43m)\u001B[49m\n",
      "File \u001B[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\pandas\\core\\computation\\expr.py:421\u001B[0m, in \u001B[0;36mBaseExprVisitor.visit_Module\u001B[1;34m(self, node, **kwargs)\u001B[0m\n\u001B[0;32m    419\u001B[0m     \u001B[38;5;28;01mraise\u001B[39;00m \u001B[38;5;167;01mSyntaxError\u001B[39;00m(\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124monly a single expression is allowed\u001B[39m\u001B[38;5;124m\"\u001B[39m)\n\u001B[0;32m    420\u001B[0m expr \u001B[38;5;241m=\u001B[39m node\u001B[38;5;241m.\u001B[39mbody[\u001B[38;5;241m0\u001B[39m]\n\u001B[1;32m--> 421\u001B[0m \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[38;5;28;43mself\u001B[39;49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mvisit\u001B[49m\u001B[43m(\u001B[49m\u001B[43mexpr\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43mkwargs\u001B[49m\u001B[43m)\u001B[49m\n",
      "File \u001B[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\pandas\\core\\computation\\expr.py:415\u001B[0m, in \u001B[0;36mBaseExprVisitor.visit\u001B[1;34m(self, node, **kwargs)\u001B[0m\n\u001B[0;32m    413\u001B[0m method \u001B[38;5;241m=\u001B[39m \u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mvisit_\u001B[39m\u001B[38;5;124m\"\u001B[39m \u001B[38;5;241m+\u001B[39m \u001B[38;5;28mtype\u001B[39m(node)\u001B[38;5;241m.\u001B[39m\u001B[38;5;18m__name__\u001B[39m\n\u001B[0;32m    414\u001B[0m visitor \u001B[38;5;241m=\u001B[39m \u001B[38;5;28mgetattr\u001B[39m(\u001B[38;5;28mself\u001B[39m, method)\n\u001B[1;32m--> 415\u001B[0m \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[43mvisitor\u001B[49m\u001B[43m(\u001B[49m\u001B[43mnode\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43mkwargs\u001B[49m\u001B[43m)\u001B[49m\n",
      "File \u001B[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\pandas\\core\\computation\\expr.py:424\u001B[0m, in \u001B[0;36mBaseExprVisitor.visit_Expr\u001B[1;34m(self, node, **kwargs)\u001B[0m\n\u001B[0;32m    423\u001B[0m \u001B[38;5;28;01mdef\u001B[39;00m \u001B[38;5;21mvisit_Expr\u001B[39m(\u001B[38;5;28mself\u001B[39m, node, \u001B[38;5;241m*\u001B[39m\u001B[38;5;241m*\u001B[39mkwargs):\n\u001B[1;32m--> 424\u001B[0m     \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[38;5;28;43mself\u001B[39;49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mvisit\u001B[49m\u001B[43m(\u001B[49m\u001B[43mnode\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mvalue\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43mkwargs\u001B[49m\u001B[43m)\u001B[49m\n",
      "File \u001B[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\pandas\\core\\computation\\expr.py:415\u001B[0m, in \u001B[0;36mBaseExprVisitor.visit\u001B[1;34m(self, node, **kwargs)\u001B[0m\n\u001B[0;32m    413\u001B[0m method \u001B[38;5;241m=\u001B[39m \u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mvisit_\u001B[39m\u001B[38;5;124m\"\u001B[39m \u001B[38;5;241m+\u001B[39m \u001B[38;5;28mtype\u001B[39m(node)\u001B[38;5;241m.\u001B[39m\u001B[38;5;18m__name__\u001B[39m\n\u001B[0;32m    414\u001B[0m visitor \u001B[38;5;241m=\u001B[39m \u001B[38;5;28mgetattr\u001B[39m(\u001B[38;5;28mself\u001B[39m, method)\n\u001B[1;32m--> 415\u001B[0m \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[43mvisitor\u001B[49m\u001B[43m(\u001B[49m\u001B[43mnode\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43mkwargs\u001B[49m\u001B[43m)\u001B[49m\n",
      "File \u001B[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\pandas\\core\\computation\\expr.py:723\u001B[0m, in \u001B[0;36mBaseExprVisitor.visit_Compare\u001B[1;34m(self, node, **kwargs)\u001B[0m\n\u001B[0;32m    721\u001B[0m     op \u001B[38;5;241m=\u001B[39m \u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39mtranslate_In(ops[\u001B[38;5;241m0\u001B[39m])\n\u001B[0;32m    722\u001B[0m     binop \u001B[38;5;241m=\u001B[39m ast\u001B[38;5;241m.\u001B[39mBinOp(op\u001B[38;5;241m=\u001B[39mop, left\u001B[38;5;241m=\u001B[39mnode\u001B[38;5;241m.\u001B[39mleft, right\u001B[38;5;241m=\u001B[39mcomps[\u001B[38;5;241m0\u001B[39m])\n\u001B[1;32m--> 723\u001B[0m     \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[38;5;28;43mself\u001B[39;49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mvisit\u001B[49m\u001B[43m(\u001B[49m\u001B[43mbinop\u001B[49m\u001B[43m)\u001B[49m\n\u001B[0;32m    725\u001B[0m \u001B[38;5;66;03m# recursive case: we have a chained comparison, a CMP b CMP c, etc.\u001B[39;00m\n\u001B[0;32m    726\u001B[0m left \u001B[38;5;241m=\u001B[39m node\u001B[38;5;241m.\u001B[39mleft\n",
      "File \u001B[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\pandas\\core\\computation\\expr.py:415\u001B[0m, in \u001B[0;36mBaseExprVisitor.visit\u001B[1;34m(self, node, **kwargs)\u001B[0m\n\u001B[0;32m    413\u001B[0m method \u001B[38;5;241m=\u001B[39m \u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mvisit_\u001B[39m\u001B[38;5;124m\"\u001B[39m \u001B[38;5;241m+\u001B[39m \u001B[38;5;28mtype\u001B[39m(node)\u001B[38;5;241m.\u001B[39m\u001B[38;5;18m__name__\u001B[39m\n\u001B[0;32m    414\u001B[0m visitor \u001B[38;5;241m=\u001B[39m \u001B[38;5;28mgetattr\u001B[39m(\u001B[38;5;28mself\u001B[39m, method)\n\u001B[1;32m--> 415\u001B[0m \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[43mvisitor\u001B[49m\u001B[43m(\u001B[49m\u001B[43mnode\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43mkwargs\u001B[49m\u001B[43m)\u001B[49m\n",
      "File \u001B[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\pandas\\core\\computation\\expr.py:536\u001B[0m, in \u001B[0;36mBaseExprVisitor.visit_BinOp\u001B[1;34m(self, node, **kwargs)\u001B[0m\n\u001B[0;32m    535\u001B[0m \u001B[38;5;28;01mdef\u001B[39;00m \u001B[38;5;21mvisit_BinOp\u001B[39m(\u001B[38;5;28mself\u001B[39m, node, \u001B[38;5;241m*\u001B[39m\u001B[38;5;241m*\u001B[39mkwargs):\n\u001B[1;32m--> 536\u001B[0m     op, op_class, left, right \u001B[38;5;241m=\u001B[39m \u001B[38;5;28;43mself\u001B[39;49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43m_maybe_transform_eq_ne\u001B[49m\u001B[43m(\u001B[49m\u001B[43mnode\u001B[49m\u001B[43m)\u001B[49m\n\u001B[0;32m    537\u001B[0m     left, right \u001B[38;5;241m=\u001B[39m \u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39m_maybe_downcast_constants(left, right)\n\u001B[0;32m    538\u001B[0m     \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39m_maybe_evaluate_binop(op, op_class, left, right)\n",
      "File \u001B[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\pandas\\core\\computation\\expr.py:458\u001B[0m, in \u001B[0;36mBaseExprVisitor._maybe_transform_eq_ne\u001B[1;34m(self, node, left, right)\u001B[0m\n\u001B[0;32m    456\u001B[0m     left \u001B[38;5;241m=\u001B[39m \u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39mvisit(node\u001B[38;5;241m.\u001B[39mleft, side\u001B[38;5;241m=\u001B[39m\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mleft\u001B[39m\u001B[38;5;124m\"\u001B[39m)\n\u001B[0;32m    457\u001B[0m \u001B[38;5;28;01mif\u001B[39;00m right \u001B[38;5;129;01mis\u001B[39;00m \u001B[38;5;28;01mNone\u001B[39;00m:\n\u001B[1;32m--> 458\u001B[0m     right \u001B[38;5;241m=\u001B[39m \u001B[38;5;28;43mself\u001B[39;49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mvisit\u001B[49m\u001B[43m(\u001B[49m\u001B[43mnode\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mright\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43mside\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43mright\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m)\u001B[49m\n\u001B[0;32m    459\u001B[0m op, op_class, left, right \u001B[38;5;241m=\u001B[39m \u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39m_rewrite_membership_op(node, left, right)\n\u001B[0;32m    460\u001B[0m \u001B[38;5;28;01mreturn\u001B[39;00m op, op_class, left, right\n",
      "File \u001B[1;32m~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\pandas\\core\\computation\\expr.py:414\u001B[0m, in \u001B[0;36mBaseExprVisitor.visit\u001B[1;34m(self, node, **kwargs)\u001B[0m\n\u001B[0;32m    411\u001B[0m         \u001B[38;5;28;01mraise\u001B[39;00m e\n\u001B[0;32m    413\u001B[0m method \u001B[38;5;241m=\u001B[39m \u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mvisit_\u001B[39m\u001B[38;5;124m\"\u001B[39m \u001B[38;5;241m+\u001B[39m \u001B[38;5;28mtype\u001B[39m(node)\u001B[38;5;241m.\u001B[39m\u001B[38;5;18m__name__\u001B[39m\n\u001B[1;32m--> 414\u001B[0m visitor \u001B[38;5;241m=\u001B[39m \u001B[38;5;28mgetattr\u001B[39m(\u001B[38;5;28mself\u001B[39m, method)\n\u001B[0;32m    415\u001B[0m \u001B[38;5;28;01mreturn\u001B[39;00m visitor(node, \u001B[38;5;241m*\u001B[39m\u001B[38;5;241m*\u001B[39mkwargs)\n",
      "\u001B[1;31mAttributeError\u001B[0m: 'PandasExprVisitor' object has no attribute 'visit_ListComp'"
     ]
    }
   ],
   "source": [
    "# Напишем функцию, которая будет проверять есть ли индекс ответа в общем списке индексов\n",
    "def is_in_base_index(base_index_list, row):\n",
    "    return row['base_idx'] in base_index_list"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "outputs": [
    {
     "data": {
      "text/plain": "           query_idx      base_idx  predict_proba\n0       100000-query  1542803-base       0.516372\n1       100000-query  2760762-base       0.497867\n2       100000-query  1822076-base       0.086825\n3       100000-query  2341758-base       0.040719\n4       100000-query  4728293-base       0.030295\n...              ...           ...            ...\n499995  199965-query  1295577-base       0.540749\n499996  199965-query  2834287-base       0.011458\n499997  199965-query   772785-base       0.010279\n499998  199965-query   239004-base       0.009264\n499999  199965-query  3280100-base       0.006950\n\n[500000 rows x 3 columns]",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>query_idx</th>\n      <th>base_idx</th>\n      <th>predict_proba</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>100000-query</td>\n      <td>1542803-base</td>\n      <td>0.516372</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>100000-query</td>\n      <td>2760762-base</td>\n      <td>0.497867</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>100000-query</td>\n      <td>1822076-base</td>\n      <td>0.086825</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>100000-query</td>\n      <td>2341758-base</td>\n      <td>0.040719</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>100000-query</td>\n      <td>4728293-base</td>\n      <td>0.030295</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>499995</th>\n      <td>199965-query</td>\n      <td>1295577-base</td>\n      <td>0.540749</td>\n    </tr>\n    <tr>\n      <th>499996</th>\n      <td>199965-query</td>\n      <td>2834287-base</td>\n      <td>0.011458</td>\n    </tr>\n    <tr>\n      <th>499997</th>\n      <td>199965-query</td>\n      <td>772785-base</td>\n      <td>0.010279</td>\n    </tr>\n    <tr>\n      <th>499998</th>\n      <td>199965-query</td>\n      <td>239004-base</td>\n      <td>0.009264</td>\n    </tr>\n    <tr>\n      <th>499999</th>\n      <td>199965-query</td>\n      <td>3280100-base</td>\n      <td>0.006950</td>\n    </tr>\n  </tbody>\n</table>\n<p>500000 rows × 3 columns</p>\n</div>"
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# оставим по 5 рекомендаций для каждого запроса\n",
    "results = pd.DataFrame()\n",
    "for query in recommendations['query_idx'].unique():\n",
    "    results = pd.concat([\n",
    "            results,\n",
    "            recommendations[\n",
    "                recommendations['query_idx'] == query\n",
    "                ].sort_values(['predict_proba'], ascending=False)[:5]\n",
    "        ],\n",
    "        ignore_index=True)\n",
    "results"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "outputs": [],
   "source": [
    "# присоединяем верные ответы\n",
    "results = results.merge(df_validation_answer, how='inner', left_on='query_idx', right_index=True)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "outputs": [],
   "source": [
    "# добавим столбец, который показывает,\n",
    "# совпадает ли рекомендованный вектор с указаниями экспертов\n",
    "results['accuracy'] = results['base_idx'] == results['Expected']\n",
    "results['accuracy'] = results['accuracy'].astype('int')"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "outputs": [
    {
     "data": {
      "text/plain": "34.97"
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# вычислим acc@5\n",
    "results['accuracy'].sum() / 100000 * 100"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Финальный вывод"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "В данной задаче требовалось разработать алгоритм, который для всех товаров из validation.csv предложит несколько вариантов наиболее похожих товаров из base. А также оценить качество алгоритма по метрике accuracy@5. Алгоритм был разработан и применён на валидационном датасете. Финальная метрика = 35%, однако может быть улучшена более чем в 2 раза, если немного доработать решение на финальном этапе."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Полезные ссылки\n",
    "\n",
    "- https://habr.com/ru/companies/vk/articles/338360/\n",
    "- https://scikit-learn.org/stable/modules/neighbors.html#unsupervised-neighbors\n",
    "- Подкаст https://music.yandex.ru/album/17951713/track/116375860\n",
    "- FAISS\n",
    "- https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/\n",
    "    - https://habr.com/ru/companies/okkamgroup/articles/509204/\n",
    "- https://evogeek.ru/articles/298310/\n",
    "- https://www.pinecone.io/learn/series/faiss/faiss-tutorial/\n",
    "- https://towardsdatascience.com/understanding-faiss-619bb6db2d1a\n",
    "- https://towardsdatascience.com/getting-started-with-faiss-93e19e887a0c\n",
    "- Annoy\n",
    "- https://erikbern.com/2015/09/24/nearest-neighbor-methods-vector-models-part-1\n",
    "- https://erikbern.com/2015/10/01/nearest-neighbors-and-vector-models-part-2-how-to-search-in-high-dimensional-spaces.html\n",
    "- https://erikbern.com/2016/06/02/approximate-nearest-news.html\n",
    "- https://github.com/spotify/annoy"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [],
   "metadata": {
    "collapsed": false
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
